搜档网
当前位置:搜档网 › Genome-Wide Protein-Protein Interaction Network Prediction for Oryza sativa

Genome-Wide Protein-Protein Interaction Network Prediction for Oryza sativa

Genome-Wide Protein-Protein Interaction Network Prediction for Oryza sativa
Genome-Wide Protein-Protein Interaction Network Prediction for Oryza sativa

Genome-Wide Protein-Protein Interaction Network

Prediction for Oryza sativa

Chun-Yu Chen 1 Chen-hsiung Chan 1

https://www.sodocs.net/doc/712191586.html,ncelo@https://www.sodocs.net/doc/712191586.html, frankch@https://www.sodocs.net/doc/712191586.html,.tw

Sheng-An Lee 1 Cheng-Yan Kao 1,2

shengan@https://www.sodocs.net/doc/712191586.html, cykao@https://www.sodocs.net/doc/712191586.html,.tw 1

Department of Computer Science and Information Engineering, National Taiwan University, Taipei 10617,

Taiwan 2 Institute for Information in Industry, Taipei, Taiwan

Keywords : orthologous groups, protein-protein interaction networks

1 Introduction

Rice is an important economic plant in Asia. The genome sequencing project for Oryza sativa is underway. Currently, more than 40,000 genes are identified in the O. sativa genome. Protein-protein interaction networks are critical for the understanding of cellular pathways and mechanisms. However, there are no experimental protein-protein interaction data available for O. sativa .

High-throughput protein-protein interaction data of several model organisms are available in online databases [1, 2]. Human protein interaction networks have been predicted through homologous and orthologous between human sequences and those in the model organisms [3, 4]. Likewise, the protein-protein interaction networks of O. sativa can be inferred through orthologous information.

2 Method and Results

We have compiled a comprehensive protein-protein interaction data from resources available on several online protein interaction databases. Nearly 200,000 interactions are collected. Unlike conventional approaches, we do not infer homology with sequence alignment. The HomoloGene database (https://www.sodocs.net/doc/712191586.html,/entrez/query.fcgi?db =

homologene) provided by National Center for Biotechnology and Information (NCBI) is used instead. The HomoloGene database contains entry set (orthologous groups) formed by sequences from 19 organisms. Each entry set contains homologous sequences from several organisms. Two sequences are considered homologous if they are within the same entry set. For O. sativa , 33,553 genes are grouped into 10,957 homologous sequences, and assigned to 9,417 entry sets in the HomoloGene database.

The proteins participate in protein-protein interactions are mapped into entry sets of HomoloGene. For example, protein A interacts with protein B. Protein A belongs to entry set #1, and protein B belongs to #2. If both entry sets #1 and #2 contain O. sativa proteins, the two O. sativa proteins are considered as interacting pair. With this approach, 14,290 interactions of O. sativa are predicted. The corresponding protein-protein interaction networks are constructed. The network follows power law, suggesting that the network has the typical features of ordinary protein-protein interaction networks. The distribution of proteins and their interactions numbers are plotted in Fig. 1.

We have picked the ‘hubs’ of the predicted protein interaction network. These proteins interact with large number of participants, and are assumed to have critical functions. Some of these proteins are listed in Table

1. The accession number (GI) and the description for these proteins are provided. The descriptions revealed indeed most of these proteins have critical functionalities.

For all predicted protein interactions, a relatively large fraction are self-association interactions. That is, a protein forms homo-multimer. For most model organisms, the percentage of self-associated protein interactions are ranged between 2 to 5%. However, nearly 7% of predicted O. sativa interactions are self-associated. Most of the O. sativa protein-protein interactions are ‘transferred’ from those of high-throughput E. coli data. Among all available E. coli protein-protein interactions, 40% are self-associated.

3 Discussions

We have predicted the protein-protein interaction network of O. sativa . The important nodes of the network (with large number of associated interactions) are identified. These proteins are verified to have critical functions, for example, the heat shock protein HSP70. Further analysis of the predicted network revealed that many of the characteristics of the network follows those of typical protein interaction networks. Our predictions should be able to provide hints for experimentalists on the pathways and mechanisms behind O. sativa .

References

[1] Bader, G .D., Betel, D., and Hogue, C. W. V ., BIND: the biomolecular interaction network database, Nucleic Acids Res., 31:248-250, 2003.

[2] Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.-M., and Eisenberg, D., DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions, Nucleic Acids Res., 30:303-305, 2002.

[3] Brown, K.R. and Jurisica, I., Online predicted human interaction database, Bioinformatics , 21:2076-2082, 2005.

[4] Huang, T.-W., Tien, A.-C., Huang, W.-S., Lee, Y .-C. G., Peng, C.-L., Tseng, H.-H., Kao, C.-Y ., and Huang,

C.-Y . F., POINT: a database for the prediction of protein-protein interactions based on the orthologous interactome, Bioinformatics , 20:3273-3276, 2004.

Figure 1: The predicted protein-protein interaction network of O. Sativa follows power

law.

Table 1: ‘Hubs’ of predicted interaction network.

GI Description

34908140 putative HSP70

50904851

peptidylprolyl isomerase Cyp2 50912149

eukaryotic translation initiation factor 3 subunit (eIF-3)-like 50898812 putative protein kinase

50915968 putative fibrillarin 34893994 putative 40S ribosomal protein S5

相关主题