Overview
ProSeg is a database of local structures of protein segments[1]. ProSeg consists of two sub-databases, Segment DB and Cluster DB. Segment DB contains thousands of segments that were prepared by dividing non-redundant representative proteins using a sliding L-residue window. These segments were classified into a number of clusters according to their three-dimensional structural resemblance using several different clustering methods. Cluster DB includes the summary of these clusters (pictures, rank, frequency, secondary structure assignment, sequence profile, etc.) and the index that can connect a certain cluster with the segments that were classified into the cluster.

Classification was carried out using a single-pass clustering method, which is one of the unsupervised non-hierarchical clustering algorithms. Structural dissimilarity (or distance) between two segments is defined on the basis of backbone dihedral angles. Please refer to the paper [2] for more technical details and scientific implications.
In order to gain the information of your interest, you can search a cluster (or clusters) by inputting appropriate parameters, which identify the backbone structure of a query segment. There are three ways to identify the backbone structure of the segment: PDB ID, dihedral angles, and DSSP symbols. After inputting one of these parameters, you will get a list of clusters that facilitates your easy access.
Because of the exhaustive analyses, ProSeg can provide the essential physicochemical properties of almost all backbone structures that a short segment is able to form. Hence, it will have many useful applications in protein science such as protein structure prediction, protein folding and protein design[3][4].
Tutorials
- for basic operation: proseg_tutorial_basic.pdf
- for advanced operation: proseg_tutorial_advanced.pdf (to be prepared)
FAQ
- What is ProSeg?
- What kind of data is archived in ProSeg?
- How many proteins and segments are analyzed in ProSeg?
- How many clusters does ProSeg contain?
- How are the segments classified?
- Where is ProSeg applicable to?
- How do I use ProSeg?
- Can I input a PDB entry as a query that is not included in the representative protein datasets in ProSeg?
- I don't know the dihedral angles of the query segment that seem to be required for input.
- I want to search the segments having different chain-lengths.
- Do I need to cite any reference when I use ProSeg in my scientific work?
- Are there any restrictions for non-academic or commercial use?
- I want to send a request or comment to the developers of ProSeg.
- What is ProSeg?
ProSeg is a database of local structures of protein segments. See “Overview”.
- What kind of data is archived in ProSeg?
ProSeg contains the number of local structures of proteins (i.e. clusters), distribution of these clusters, summary of the clusters including their coordinates, secondary structures, amino acid preference of the clusters, number of segments that were classified into a certain cluster, list of these segments, sequences of the segments, and so on.
- How many proteins and segments are analyzed in ProSeg?
Currently, 370 protein chains are analyzed and stored (Jan. 24, 2007). These protein chains are divided into 78622 of 5-residue long segments, 76694 of 9-residue long segments, 75744 of 11-residue long segments, or 73876 of 15-residue long segments.
- How many clusters does ProSeg contain?
- How are the segments classified?
Segments were classified using a “one-pass clustering” method or a “3D mesh gridding” method. See “Overview” and “Glossary” for details.
- Where is ProSeg applicable to?
It will be applicable to the studies for protein structure prediction, protein folding and protein design.
- How do I use ProSeg?
There are three ways to identify the backbone structure of a query segment: PDB ID, dihedral angles, and DSSP symbols. See “Tutorials”.
- Can I input a PDB entry as a query that is not included in the representative protein datasets in ProSeg?
Yes. In case of the PDB entry that is not included in ProSeg, your query will be automatically transferred to the original PDB site by an internal process of ProSeg. Therefore, you can input the latest PDB entries as a query.
- I don't know the dihedral angles of the query segment that seem to be required for input.
You can specify the query segment by inputting its PDB ID, Chain ID, and the number of central residue, whereupon the dihedral angles will be displayed. In addition, you can edit these displayed values later on. Furthermore, you can search by inputting DSSP symbols when the secondary structure of the query segment is known. See “Tutorials”.
- I want to search the segments having different chain-lengths.
Currently, only four kinds of chain-lengths (L=5, 9, 11 and 15) can be used. Advanced functions that enable you to search the segments having different chain-lengths will be implemented in the future.
- Do I need to cite any reference when I use ProSeg in my scientific work?
Yes. You should refer to the papers [1], [2] and/or the URL of this site when you are using the ProSeg search results in your publication.
- Are there any restrictions for non-academic or commercial use?
Non-academic and/or commercial users may be asked to pay a charge to access ProSeg. For details please contact us (proseg@m.aist.go.jp) prior to using ProSeg.
- I want to send a request or comment to the developers of ProSeg.
Please send an e-mail to proseg@m.aist.go.jp.
Statistics and release information
| Version | Proteins | Representative protein filter | Segments | Length | Clusters | Clustering method |
|---|---|---|---|---|---|---|
| 0.2 - 0.4 | 370 | Culled_PDB_Dec_13_2001 | 76694 | 9 | 10494 | ONE_PASS (Dth=30) |
| 4179 | ONE_PASS (Dth=40) | |||||
| 1449 | 3D_MESH | |||||
| 0.6 | 370 | Culled_PDB_Dec_13_2001 | 73876 | 15 | 30187 | ONE_PASS (Dth=30) |
| 0.8 | 370 | Culled_PDB_Dec_13_2001 | 78622 | 5 | 2217 | ONE_PASS (Dth=30) |
| 75744 | 11 | 17096 | ONE_PASS (Dth=30) |
| Version | Release Date | Development Points |
|---|---|---|
| spring 2005 | ProSeg project starts | |
| 0.2 | December 20, 2005 | ALPHA2 form (core of search system) |
| 0.4 | April 13, 2006 | BETA1 form (interface, overview, faq, glossary, etc.) |
| 0.6 | August 30, 2006 | BETA2 form (data expansion, error messages) |
| January 24, 2007 | released to public | |
| 0.8 | November 1, 2007 | BETA3 form (data expansion, tutorials, hyperlinks to PDB, PDB file upload) |
| 1.0 | March 16, 2008 | Final form (new interface) |
General restrictions
Documents, data, images, etc., on this server (hereinafter referred to as "materials") are all protected by copyright, and are the exclusive property of the AIST. The materials recorded on this server may not be reproduced, all or in part, by any means or format, provided to a third party, or be used for purposes other than those originally intended, without the expressed permission of the owners. Please refrain from copying analytical values and other data obtained from the AIST experiments, even if the data themselves are not proprietary. The AIST cannot guarantee the integrity of the materials presented on this server. In other words, even if the materials on this server contain errors, the AIST will not be held responsible for the said errors. Non-academic and/or commercial users may be asked to pay a charge to access the materials in ProSeg.
References
- [1]. Sawada, Y. and Honda, S. (2009) "ProSeg: a database of local structures of protein segments" J. Comput. Aided Mol. Des. 23(3), 163-169. [PubMed] [PDF]
- [2]. Sawada, Y. and Honda, S. (2006) "Structural Diversity of Protein Segments Follows a Power-law Distribution" Biophysical J. 91(4) 1213-1223. [PubMed] [PDF]
- [3]. Honda, S., Yakmasaki, K., Sawada, Y. and Morii, H. (2004) "10-residue folded peptide designed by segment statistics" Structure 12(8) 1507-1518. [PubMed] [PDF]
- [4]. Honda, S., Akiba, T., Kato, Y.S., Sawada, Y., Sekijima, M., Ishimura, M., Ooishi, A., Watanabe, H., Odahara, T. and Harata, K. (2008) "Crystal Structure of a Ten-Amino Acid Protein" J. Am. Chem. Soc. 130(46), 15327-15331. [PubMed] [PDF]
Contributors
Developers:
Shinya HONDA (AIST), Yoshito SAWADA (AIST), Keiichi TSUKAMOTO (MSS), Hiroaki ISHIKAWA (MSS), Ikkou HARITA (MSS), Hiroki MATSUMOTO (Nagaoka Univ.), Tsukuba Advanced Computing Center in AIST
Advisers:
Miyuki ISHIMURA (AIST), Hisayuki MORII (AIST), Kentaro TOMII (AIST), and more
Contact
For questions or comments about the use of this server, please contact: proseg@m.aist.go.jp
