Signal Peptide Database

Signal Peptide Website: Hints

Search my Protein allows searching of the signal sequence database and the reference database for a particular protein of interest. Advanced Search is the tool for searching the signal sequence database on the basis of: the protein name, species or lineages, signal sequence length and/or amino acid sequence. Database search provides a direct access to the signal sequence database grouped into Mammalia, Drosophila, Viruses and Bacteria. References is a database containing references and can be searched by the reference title (or single words appearing in the title), author, and/or keywords. Links provides several links to websites providing information on signal peptides or websites with related topics.

The signal sequence database was compiled on the basis of the UniProt Knowledgebase Release 14.7 consisting of UniProtKB/Swiss-Prot Release 56.7 and UniProtKB/TrEMBL Release 39.7 (January 20, 2009). In both cases, sequence entries were extracted that contain SIGNAL PEPTIDE as key name within the FT line (Feature table data). This line occurs optionally rather than in all UniProtKB/Swiss-Prot or UniProtKB/TrEMBL sequence entries, and gives the length of the signal sequence. If the signal sequence is not further annotated by any qualifier, experimental data (i.e. sequencing of the mature protein) have been available. Furthermore, three different non-experimental qualifiers (potential, probable and by similarity) can be included in the FT line in order to indicate that the annotation is based on non-experimentally proven findings. For further information on the annotation visit the UniProt Knowledgebase (Uniprot KB) web site (http://www.expasy.org/sprot/userman.html). Both databases are accessed via the ExPASy (Expert Protein Analysis System) proteomics server from the Swiss Institute of Bioinformatics (SIB). Alternatively, the UniProt KB and further information can be accessed via the UniProt Homepage (http://www.uniprot.org). The extracted sequence entries are imported into a relational MySQL database. Currently, there are 218043 entries present, and further updates will be provided on a regular basis.

The References library depends on searching the PubMed database (http://www.ncbi.nlm.nih.gov/sites/entrez) as part from the NCBI website (http://www.ncbi.nlm.nih.gov/) via an EndNote 9.0. interface. The following search terms to occur within the title or the abstract were applied: signal_sequence/s, signal_peptide/s, leader_peptide/s, leader_sequence/s, signal_peptidase, leader_peptidase, signal_peptide_peptidase and intramembrane protease (hand-curated to contain references concerning SPP). A database named reference library was generated. Furthermore, references are included that (i) have been suggested by website users and (ii) fulfil the criteria i.e. the search terms as detailed above occur either in the title or in the abstract. The current version of the reference library database is from January 20, 2009 and contains 20419 entries. Database updates will be provided on a regular basis.

References:
1.) Boeckmann B., Bairoch A., Apweiler R., Blatter M.-C., Estreicher A., Gasteiger E., Martin M.J., Michoud K., O'Donovan C., Phan I., Pilbout S. and Schneider M. (2003) The Swiss-Prot Protein Knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31, 365-70
2.) The UniProt Consortium (2007) The Universal Protein Resource (UniProt). Nucleic Acids Res 35, D193-7
3.) Gasteiger E., Gattiker A., Hoogland C., Ivanyi I., Appel R.D., Bairoch A.: ExPASy: the proteomics server for in-depth protein knowledge and analysis Nucleic Acids Res. 31:3784-3788 (2003)
4.) PubMed Homepage

Search my Protein

This tool is designed for researchers who want to know more about one specific protein of interest, both about sequence data and about literature references concerning the protein's signal sequence. The protein name is the starting point for searching the signal sequence database and the reference library.

There are two options for the output of the signal sequence database, either as table or as FASTA format. The table gives an overview and each sequence entry is linked to further information from the UniProt Knowledgebase homepage. The subsequent output interface includes among others: the source database (either UniProtKB/Swiss-Prot or UniProtKB/TrEMBL), the accession number and the entry name (both are due to UniProtKB/Swiss-Prot or UniProtKB/TrEMBL), the protein name, the organism from which the protein derives, the lineage of the organism, the protein sequence and the signal sequence. Most important, there is a direct link via the protein's entry name to the UniProt Knowledgebase homepage (including UniProtKB/Swiss-Prot and UniProtKB/TrEMBL) located at http://au.expasy.org/sprot/.

Furthermore, the output interface gives a hydrophobicity plot of the signal peptide. According to the hydrophobicity scale by Kyte and Doolittle, each amino acid has a specific value, which is directly plotted. Amino acids with hydrophobic character have positive values.

The FASTA format gives the results in the format necessary to directly conduct an alignment. To this end, the output interface provides links to public domain alignment programs, e.g. ClustalW.

The results from the signal sequence database search can be sorted by different parameters: the accession number and the entry name (both are UniProtKB/Swiss-Prot or UniProtKB/TrEMBL identifiers), the protein name, the organism, the length of the signal sequence and the signal sequence.

The reference library output is a table providing the first author and the title of a reference. Within the table, the references are sorted by date. Each subject is again linked to further information on this reference, i.e. the complete author list, the title, the publication information (name of the journal, publication year, issue, pages), and the NLM/PubMed MeSH terms. There is also a direct link to the PubMed website via the PMID (PubMed Unique Identifier).

Reference:
1.) Kyte J. and Doolittle R.F. (1982) A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157 (1), 105-32

Advanced Search

This tool is designed for researchers who are interested in more than one single protein's signal sequence. Here, the signal sequence database can be searched using a number of different input options:

Organism: The Latin name is required but the term is case insensitive, for example: Homo or Homo sapiens
Lineage: Any parameter within the lineage string can be chosen, e.g. the species name, the genus name, the family, the class etc., for example: retroviridae
Protein: The protein name is required, for example: prolactin
Signal sequence length: The length can be chosen in numbers, either choose a minimum, a maximum or both.
Sequence: The amino acid sequence (single letter code only) can be typed or pasted, a wildcard can be employed by typing an underscore.

Dependent on your interest, the output format can be chosen:

Table: The output format to show the complete result. Further information is provided for each sequence entry.
FASTA: This output format allows to directly conducting alignments. To this end, a couple of alignment programs are given as link within the output screen.
Size differentiation: This format displays a table and bar diagram showing the distribution of the number of proteins sorted by their signal sequence length.

The results from the search can be sorted by different parameters: the accession number and the entry name (both are UniProtKB/Swiss-Prot or UniProtKB/TrEMBL identifiers), the protein name, the organism, the length of the signal sequence and the signal sequence.

References

The references library contains a database with references/citiations based on a PubMed search with the terms:signal_sequence/s, signal_peptide/s, leader_peptide/s, leader_sequence/s, signal_peptidase, leader_peptidase, signal_peptide_peptidase and intramembrane protease (hand-curated to contain references concerning SPP).

As input format four different terms can be chosen: author, title, year or NLM/PubMed MeSH Terms.

The output is a table providing the first author and the title of a reference. Each dataset is connected with further information, i.e. the complete author list, the title, the publication information (name of the journal, publication year, issue, pages), and the NLM/PubMed MeSH terms. Furthermore, the PMID (PubMed Unique Identifier) is given, which is a direct link to PubMed.

If you are missing a reference, please contact us by E-Mail.