搜档网
当前位置:搜档网 › ESyPred3D蛋白质三级结构预测

ESyPred3D蛋白质三级结构预测

ESyPred3D蛋白质三级结构预测
ESyPred3D蛋白质三级结构预测

BIOINFORMATICS

Vol.18no.92002Pages

1250–1256

ESyPred3D:Prediction of proteins 3D structures

Christophe Lambert ?,Nadia L′eonard,Xavier De Bolle and Eric Depiereux

Facult′es Universitaires Notre-Dame de la Paix,Unit′e de Recherche en Biologie Mol′eculaire,Rue de Bruxelles,61,B-5000Namur,Belgium

Received on February 1,2002;revised on March 7,2002;accepted on March 18,2002

ABSTRACT

Motivation:Homology or comparative modeling is cur-rently the most accurate method to predict the three-dimensional structure of proteins.It generally consists in four steps:(1)databanks searching to identify the struc-tural homolog,(2)target–template alignment,(3)model building and optimization,and (4)model evaluation.The target–template alignment step is generally accepted as the most critical step in homology modeling.

Results:We present here ESyPred3D,a new automated homology modeling program.The method gets bene?t of the increased alignment performances of a new alignment strategy.Alignments are obtained by combining,weighting and screening the results of several multiple alignment programs.The ?nal three-dimensional structure is build using the modeling package MODELLER.ESyPred3D was tested on 13targets in the CASP4experiment (C ritical A ssessment of Techniques for Proteins S tructural P rediction).Our alignment strategy obtains better results compared to PSI-BLAST alignments and ESyPred3D alignments are among the most accurate compared to those of participants having used the same template.

Availability:ESyPred3D is available through its web site at http://www.fundp.ac.be/urbm/bioinfo/esypred/.

Contact:https://www.sodocs.net/doc/2b862219.html,mbert@fundp.ac.be;http://www.fundp.ac.be/~lambertc

INTRODUCTION

Three-dimensional (3D)protein structure is an important source of information to better understand the function of a protein,its interactions with other compounds (ligands,proteins,DNA,...)and to understand phenotypical effects of mutations (Tramontano,1998).The 3D protein struc-ture can be predicted according to three main categories of methods (Rost and O’Donoghue,1997):(1)homology or comparative modeling (described below);(2)fold recogni-tion (predicting the global fold of a protein);(3)ab initio techniques (trying to model the 3D structure of proteins using only the sequence and a force ?eld).

?To whom correspondence should be addressed.

Homology modeling is historically the ?rst (Browne et al.,1969)and the most accurate method (Sanchez and Sali,1997).It was shown during the last CASP experiment (Venclovas et al.,1999)(C ritical A ssessment of Techniques for Proteins S tructural P rediction)that the critical steps are:(1)template selection,(2)target–template alignment step,(3)modeling of regions not present or signi?cantly different from those in template and (4)modeling of side chains.Among these critical steps,it is commonly accepted that the target–template alignment step is the most critical (Mosimann et al.,1995;Martin et al.,1997).

It is known that above 50%of identity rate between target and template,pairwise alignments provide accurate models.Between 30%and 50%of identity,multiple align-ments between target,template and similar proteins must be used and the pairwise alignments between target and template must be extracted from this multiple alignment.Below 30%of identity rate,only heuristic combinations of multiple alignments,experimental data and know-how of an expert are able to generate an accurate model.

A large number of techniques have been developed to predict 3D structures of proteins by homology modeling.For the target–template alignment step,most of them use PSI-BLAST (Altschul et al.,1997),PileUp (Wisconsin Package Version 9.1,Genetics Computer Group (GCG),Madison,Wisc.),ClustalW (Thompson et al.,1994),3D-PSSM (Fischer et al.,1999),SAMT99(Karplus et al.,1997),or also the alignment producing the best model out of a collection computed from various alignment programs (Yang and Honig,1999).

Our laboratory developed the MATCHBOX multiple sequence alignment software in the early 1990s (De-piereux and Feytmans,1992)and it has proved to be one of the most accurate in terms of speci?city (Depiereux et al.,1997).Much effort has been consented into im-proving alignment accuracy by adding information such as secondary structure predictions,solvent accessibility predictions,speci?c scoring matrices and combination with ClustalW.In all cases,it was only possible to slightly improve multiple alignment accuracy (unpublished re-1250

c

Oxford University Press 2002

ESyPred3D:Prediction of proteins3D structures

sults).Meanwhile,no signi?cant improvements in align-ment performance have been published by other groups. Furthermore,no alignment method can be quali?ed as the absolute most reliable one.Indeed,benchmarks(Briffeuil et al.,1998;Thompson et al.,1999)have shown that comparative performances of alignment programs are deeply dependent on the set of aligned sequences.

In this work,we tackle the target–template alignment problem by developing a speci?c program to align target and template sequences in homology modeling.Matching of homologous segments is improved by incorporation of the results of several multiple alignment programs.Results are scored to optimize the performances and screened to remove incompatible matches.Several algorithmic prob-lems have required speci?c developments in order to gen-erate and ef?ciently screen the database of the various and often incompatible alignments proposed by the different algorithms.This new alignment strategy is included into our ESyPred3D program(http://www.fundp.ac.be/urbm/ bioinfo/esypred/)that predicts the3D structure of proteins using the homology modeling approach.

SYSTEM AND METHODS

Our automatic program(ESyPred3D)implements the four steps of the homology modeling approach(Eisen-haber et al.,1995):(1)databanks searching to identify the structural homolog,(2)target–template alignment, (3)model building and optimization and(4)model eval-uation(not implemented at the time of the CASP4ex-periment).ESyPred3D was run on an SGI Octane Dual processor225MHz workstation under IRIX6.5. Identifying the structural homolog

To?nd homologs to the target sequence,PSI-BLAST2.0.14 (downloaded from NCBI and run locally)is run using the latest possible version of the NR databank(NCBI).The chosen template is the sequence from the latest version of the PDB databank with the lowest expected value after four iterations.The cutoff for the expected value is0.0001 (-h?ag).If no template is found with these criteria,the program stops.

Aligning sequences and constructing the3D model According to Thompson et al.(1999),the quality of the alignment of sequences highly depends on the context of the alignment.The results obtained for a given pair of sequences may be different depending on the set of sequences submitted to multiple alignment programs.So, after fetching all sequences retrieved by PSI-BLAST,two sets of sequences are generated in order to create two different computational conditions for running multiple alignment programs:the set A contains the50best hits including the target and the template(the number of sequences is limited to50to reduce computing time).The set B is a subset of at least seven sequences,including the target and the template,produced by dropping too redundant sequences with the PURGE program(provided with the Gibbs package(Neuwald et al.,1995)).The BLAST score(using the-b?ag)to select or eliminate sequences during the PURGE operation is250.

The building of the target–template alignment is per-formed in these steps(see Figure1):

(a)Matching.Both sets of sequences(A and B)are

aligned by?ve alignment programs emerging from

two benchmarks(Briffeuil et al.,1998;Thompson et

al.,1999).These programs are:ClustalW,Dialign2

(Morgenstern,1999),Match-Box(Depiereux et al.,

1997),Multalin(Corpet,1988)and PRRP(Gotoh,

1996).Ten multiple alignments are generated,each

one including the target and template sequences.

Then,the pairwise alignments between the target

and template sequences are extracted,leading to ten

different pairwise alignments between the target and

the template.

(b)Database building.Each position of the alignments

is stored in a database,all the redundant results,i.e.

the same amino acid placed at the same position

by different programs,being scored in a frequency

table.

(c)Screening.The position with the highest score is

taken as the?rst anchor point to build the?nal

target–template alignment.Incompatible results(see

Figure2.),aligning regions located up-and down-

stream anchor points,are removed from the database.

The process is pursued,new anchor positions being

determined,and incompatible regions being elimi-

nated,until all results are selected or removed.The

?nal target–template alignment is thus composed

by the most frequent aligned positions,under the

condition of compatibility.

This?nal pairwise alignment is used by the MODEL homology modeling routine of MODELLER release4 (Sali and Blundell,1993;Sali et al.,1997)to build a 3D model of the target protein.This routine includes the satisfaction of spatial and geometric restraints and a very fast molecular dynamic annealing:no other re?nements were applied.

Participation to the CASP4experiment

ESypred3D server participated to the CASP4experiment (see complete results at https://www.sodocs.net/doc/2b862219.html,/ casp4/;group218:LAMBERT-CHRISTOPHE).All the models generated by MODELLER were submitted to the CASP4contest without any geometric or energetic evaluation.The number of homology modeling targets used during the CASP4competition was too small to take

1251

https://www.sodocs.net/doc/2b862219.html,mbert et al.

ESyPred3D target-template alignment

Fig.1.Flowchart of the ESyPred3D target –template alignment method.See the text for details.

Fig.2.Example of compatible and incompatible results on two hypothetical sequences.Three cases are reported:(a)Alignments I –I and I –L are not compatible because the same amino acid in sequence 1is aligned to two amino acids in sequence 2.(b)Alignments P –P and A –A are not compatible.P in sequence 1is at the right of A but P in the second sequence is the left of A.(c)Alignment I –I and P –P are compatible.The prolines are both at the right of the isoleucines.

a very robust statistical conclusion about the performances of our method.However,results obtained provide a ?rst estimation of performances.For more statistical results,see the continuous evaluation of servers performed by EV A (Eyrich et al.,2001)(https://www.sodocs.net/doc/2b862219.html,/eva/).

For the purpose of the CASP4experience,two models were built for 13comparative modeling targets (Table 1.)for which ESyPred3D was able to predict a 3D structure:(1)The ?rst model was built using the complete strategy

described above

(ESyPred3D)(models T0xxxTS2181).(2)The second model was built using the same strategy

as ESyPred3D but by using the rough sequence-structure alignment provided by PSI-BLAST (models T0xxxTS2182).

Scoring schemes used to compare target structures to models

To compare ESyPred3D models to PSI models and ESyPred3D models to models of other CASP4partici-pants,the AL0and the GDT TS scores were chosen.Both scores where calculated using the LGA (Local –Global Alignment)program (Zemla,2000).

AL0

AL0is the number of correctly aligned residues in the target –template alignment.This score is very signi ?cant in this case because our method was designed to generate op-timal alignment performances.This number is evaluated by,at ?rst,making a structural alignment of the prediction and the target structure with the DALI-server,and then,counting the number of residues in the model for which the closest residue in the target is the correct one (the distance

between their α-carbons being less than 3.8?A).

GDT TS

The Global Distance Test,G DT d i ,is the number of α-carbons of a prediction not deviating from more than d i ?A

from the α-carbons of the targets,after optimal super-imposition.This optimal superimposition is computed in such a way that the number of residues (α-carbons)that can ?t under the distance cutoff d i is maximum.

If NT is the total number of residues of the target,GDT TS (GDT Total Score)computed according to the formula given below is the mean fraction of residues of the target not deviating from the prediction after four

optimal superimpositions with 1.0,2.0,4.0,8.0?A α-carbon distance cutoffs.The GDT TS score represents the overall quality of the model.This score was used to evaluate the complete procedure of ESyPred3D:identify-ing the structural homolog,aligning target to template and building the 3D model.

G DT T S =100?

d i

G DT d i

N T

4

d i ∈{1.0,2.0,4.0,8.0}

1252

ESyPred3D:Prediction of proteins3D structures Table1.Homology modeling targets for the CASP4experiment

Target Description PDB code

T0090ADP-ribose pyrophosphatase,E.coli1g0s,1g9q,1ga7 T0092Hypothetical protein HI0319,H.In?uenzae

T0099No description

T0103Pepstatin insensitive carboxyl proteinase,Pseudomonas sp.1ga6

T0111Enolase,E.coli

T0112Ketose Reductase/Sorbitol Dehydrogenase,B.argentifolii1e3j

T0113Short chain3-hydroxyacyl-coa dehydrogenase,rat1e3w,1e3s,1e6w T0117Deoxyribonucleoside kinase,D.Melanogaster

T0121MalK,T.litoralis1g29

T0122Tryptophan Synthase alpha subunit,P.furiosus1geq

T0123Beta-lactoglobulin,pig1exs

T0125Sp18protein,H.fulgens1gak

T0128Manganese superoxide dismutase homolog,P.aerophilum

RESULTS AND DISCUSSION

The performance of our homology modeling server is analyzed in three steps.In the?rst section,ESyPred3D alignments are compared to PSI-BLAST alignments.In the second section,ESyPred3D alignments are compared to those of other participants having used the same template.Since our alignment method is speci?cally designed for homology modeling,in the third section, ESyPred3D models are compared to those of other CASP4 competitors in order to evaluate the global performance of our homology modeling strategy.

Alignment performances of ESyPred3D models compared to those of PSI-BLAST models

Table2contains AL0scores for all models.Out of the 13models,ESyPred3D obtains nine AL0scores greater than PSI-BLAST and only one AL0score signi?cantly lower(more than two amino acids incorrectly aligned) than PSI-BLAST,for T0112.Two reasons explain the poor alignment for T0112:

(1)The number of homologs found by PSI-BLAST was

so large that the non-redundant set could not be

computed with PURGE.

(2)Four regions of T0112shared only a very low sim-

ilarity with homologues.So the different alignment

programs produced contradictory results in these re-

gions,and only a poor alignment could be estab-

lished by our method.

From this?rst evaluation,we can conclude that the quality of the target–template alignment is generally better by using ESyPred3D alignment methodology than using the target–template alignment provided by PSI-BLAST. Comparison of ESyPred3D with those of the participants having used the same template

The number of groups that have used the same templates as ESyPred3D is strongly variable(from two to65groups, Figure3).In Figure3,using AL0scores,ESyPred3D models obtained one time the?rst place,?ve times the second place,three times the third place and one times the fourth place.ESyPred3D models are then ten times in the top four places out of the13targets.Taking into account that a group that performs better than ESyPred3D model for one target is rarely the same that performs better for another target,one can conclude that our methodology is among the most ef?cient.

Comparison of ESyPred3D models with those of all CASP4participants

In this section,the complete strategy of ESyPred3D is evaluated and the performances are compared to those of other CASP4participants using the GDT TS score. The number of models submitted for each target was always above200.So,to enable a rapid interpretation of the distributions of scores,we have computed the third quartile of these distributions.The third quartile(Q3)of a distribution is the value such that75%of values in the list are less or equal to it.All information provided in Figure4has been normalized by the Q3value,for each target.For each target,Figure4contains:(1)the GDT TS score of ESyPred3D models;(2)the GDT TS score of the

1253

https://www.sodocs.net/doc/2b862219.html,mbert et al.

Table2.Scores for ESyPred3D and PSI-BLAST models.The last column shows templates that lead to the best models presented at CASP4 Model RMSD(allαcarbons)GDT TS AL0Template Templates leading to the best models

T0090TS2181 6.5230.15411tum1mut

T0090TS2182 6.4423.37301tum1mut

T0092TS218114.7035.69731d2g1xva,1d2h

T0092TS2182 5.2234.69661d2g1xva,1d2h

T0099TS2181 5.5452.23261qly1a0n,1ad5,2hck,1qcf,2src

T0099TS2182 5.5650.00211qly1a0n,1ad5,2hck,1qcf,2src

T0103TS218111.9538.591281sbh1mee,1sup

T0103TS218212.4133.761131sbh1mee,1sup

T0111TS2181 2.2983.553831one1pdz,1pdy,1ykf,4-7enl

T0111TS2182 2.2682.393811one1pdz,1pdy,1ykf,4-7enl

T0112TS2181 5.3554.311741hdy1teh,1ykf

T0112TS2182 4.1359.191971hdy1teh,1ykf

T0113TS2181 3.3281.862141hdc1hdc,2hsd

T0113TS2182 3.6880.492071hdc1hdc,2hsd

T0117TS21818.2456.851141qhi1e2k,1kim,1ki2-7

T0117TS2182 3.8755.711091qhi1e2k,1kim,1ki2-7

T0121TS2181 3.3541.941431b0u1b0u

T0121TS2182 3.3840.931411b0u1b0u

T0122TS2181 2.4379.152031cw21a5a,1a5b,1beu,1cw2

T0122TS2182 2.4174.581901cw21a5a,1a5b,1beu,1cw2

T0123TS2181 4.1563.911022a2u2a2g,1beb

T0123TS2182 3.7565.471022a2u2a2g,1beb

T0125TS2181 4.1561.13743lyn2lis,3lyn

T0125TS2182 4.0760.40753lyn2lis,3lyn

T0128TS2181 1.7486.731851abm1b06,1sss

T0128TS2182 1.6587.321871abm1b06,1sss

best model received by CASP4organizers and(3)the third quartile is equal to1.0because of the normalization. Figure4shows that ESyPred3D built three models with scores close to the best model,indeed the second place was obtained for targets T0103,T0121and T0122 (see full tables at https://www.sodocs.net/doc/2b862219.html,/casp4/). ESyPred3D predicted eight models above Q3values,i.e. in the top25%of participants.Further analysis of the data at the CASP4web site shows that there are few groups that have reached such a number of scores values above the Q3. It is also important to note that the group that obtained the best model for one target is rarely the same that submitted the best model for another target.

The analysis of GDT TS scores(Figure4)showed that seven targets(T0090,T0092,T0099,T0112,T00117,T0123and T0128)obtained values signi?cantly lower than those of the best models.For targets T0090,T0092, T0099,T0117,T0123and T0128the low values of GDT TS are due to the selection of a template that was not fully adequate.Indeed,for these targets,the alignment performances remain good when comparing only to groups that used the same template,as shown by the AL0score in Figure3.Although in our methodology the template selection process has to be improved,it is important to note that a completely inadequate template was never chosen.The result of T0112is due to the quality of the alignment as you can see in Figure3.

The fact that eight models from13are above the Q3 shows that our alignment method combined with the PSI-BLAST template selection and the use of MODELLER

1254

ESyPred3D:Prediction of proteins 3D

structures

102030405060708090100T 0090T 0092T 0099T 0103T 0111T 0112T 0113T 0117T 0121T 0122T 0123T 0125T 0128

Targets

A L 0 (i n % o f t h e l e n g t h )

Fig.3.AL0scores for targets studied in this work.Two series are reported for each target:the score of ESyPred3D models (black bullets)and the scores of models of other CASP4participants having used the same template (blank bullets).AL0scores are expressed as a fraction of the length of the

target.

T 0090T 0092T 0099T 0103T 0111T 0112T 0113T 0117T 0121T 0122T 0123T 0125T 0128

Targets

G D T _T S (i n % o f Q 3)

Fig.4.GDT TS scores for targets studied in this work.Three points are reported for each target:the score of the model that obtain the best score (bold line),the ESyPred3D model (box)and the third quartile value (dotted line).All values are expressed as a fraction of the third quartile value.The group IDs of best predictors with their selected templates are also reported.

to obtain the 3D model is a good strategy.Even if the template selection or the alignment quality is not optimal,the global quality of the ESyPred3D modeling strategy remains good.

CONCLUSION

A new alignment methodology for homology modeling of proteins has been developed.The program has been tested on 13targets of the CASP4for its alignment performances and for the general quality of the provided models.

Our alignment strategy produced better results com-pared to PSI-BLAST alignments and ESyPred3D align-

ments are among the most accurate comparing to partic-ipants having used the same template.Furthermore,our ESyPred3D program provides models that are among the best of the CASP4experiment.

Nevertheless,our alignment methodology could be im-proved.Thompson et al.(1999)and Briffeuil et al.(1998)benchmarks showed that all alignment programs have different level of performance.We plan to use this in-formation to improve the computing of the alignment,by weighting each multiple alignment method with the numeric representation of the mean performance of the method.Additional information such as secondary struc-ture predictions can also be used in the box selection in order to improve the alignment quality.

The template problem remains troublesome in homol-ogy modeling,especially when the target and template sequences are sharing a low identity rate.To improve the template selection,the use of better parameters or better scoring matrices for PSI-BLAST (like the one de-scribed in Kann et al.(2000))need to be investigated.In the same way,PSI-BLAST can also be replaced by SAM-T99(Karplus et al.,1998)or other programs.The intrinsic quality of the possible template structures (NMR,resolution,...)and the selection of multiple templates will also be taken into account to improve our modeling strategy.

The model evaluation step of our homology modeling methodology has not yet been developed.Geometric and energetic evaluation of the model can be done using ANOLEA (Melo and Feytmans,1997),PROCHECK (Laskowski et al.,1993)or Verify3D (Luthy et al.,1992).The results of these evaluations will be used to change our target –template alignment or to select a more appropriate template.The process will be iterated in order to ?nd the template that provides the best evaluated model.A similar iteration procedure has been used by the Blundell group in the CASP4.

ACKNOWLEDGMENTS

We thank the organizers and assessors of the CASP4experiment for their valuable contributions to the structure prediction ?eld.Christophe Lambert holds a specialized

grant from the ‘Fonds pour la Formation `a

la Recherche dans l ’Industrie et dans l ’Agriculture ’(F.R.I.A.).We particularly want to thank Guy Baudoux,Katalin de Fays and Johan Wouters for helpful and fruitful discussions.REFERENCES

Altschul,S.F.,Madden,T.L.,Sch ¨a

ffer,A.A.,Zhang,J.,Zhang,Z.,Miller,W.and Lipman,D.J.(1997)Gapped BLAST and PSI-BLAST:a new generation of protein database search programs.Nucleic Acids Res.,25,3389–3402.

Briffeuil,P.,Baudoux,G.,Reginster,I.,De Bolle,X.,Vinals,C.,Feytmans,E.and Depiereux,E.(1998)Comparative analysis of

1255

https://www.sodocs.net/doc/2b862219.html,mbert et al.

seven multiple protein sequence alignment servers:clues to enhance predictions reliability.Bioinformatics,14,357–366. Browne,W.J.,North,A.C.and Phillips,D.C.(1969)A possible three-dimensional structure of bovine alpha-lactalbumin based on that of hen’s egg-white lysozyme.J.Mol.Biol.,42,65–86. Corpet,F.(1988)Multiple sequence alignment with hierarchi-cal clustering.Nucleic Acids Res.,16,10881–10890. Depiereux,E.,Baudoux,G.,Briffeuil,P.,Reginster,I.,De Bolle,X., Vinals,C.and Feytmans,E.(1997)Match-Box server:a multiple sequence alignment tool placing emphasis on https://www.sodocs.net/doc/2b862219.html,put.

Appl.Biosci.,13,249–256.

Depiereux,E.and Feytmans,E.(1992)Match-Box:a fundamentally new algorithm for simultaneous alignment of several protein https://www.sodocs.net/doc/2b862219.html,put.Appl.Biosci.,8,501–509. Eisenhaber,F.,Persson,B.and Argos,P.(1995)Protein structure prediction:recognition of primary,secondary,and tertiary structural features from amino acid sequence.Crit.Rev.Biochem.

Mol.Biol.,30,1–94.

Eyrich,V.A.,Marti-Renom,M.A.,Przybylski,D.,Madhusudhan,M.S., Fiser,A.,Pazos,F.,Valencia,A.,Sali,A.and Rost,B.(2001)EV A: continuous automatic evaluation of protein structure prediction servers.Bioinformatics,17,1242–1243.

Fischer,D.,Barret,C.,Bryson,K.,Elofsson,A.,Godzik,A.,Jones,D., Karplus,K.J.,Kelley,L.A.,MacCallum,R.M.,Pawowski,K., Rost,B.,Rychlewski,L.and Sternberg,M.(1999)CAFASP-1: critical assessment of fully automated structure prediction methods.Proteins,(Suppl3),209–217.

Gotoh,O.(1996)Signi?cant Improvement in Accuracy of Multiple Protein Sequence Alignments by Iterative Re?nement as Assessed by Reference to Structural Alignments.J.Mol.Biol., 264,823–838.

Kann,M.,Qian,B.and Goldstein,R.A.(2000)Optimization of a new score function for the detection of remote homologs.Proteins, 41,498–503.

Karplus,K.,Barrett,C.and Hughey,R.(1998)Hidden Markov models for detecting remote protein homologies.Bioinformatics, 14,846–856.

Karplus,K.,Sjolander,K.,Barrett,C.,Cline,M.,Haussler,D., Hughey,R.,Holm,L.and Sander,C.(1997)Predicting protein structure using hidden Markov models.Proteins,(Suppl1), 134–139.

Laskowski,R.A.,Moss,D.S.and Thornton,J.M.(1993)Main-chain bond lengths and bond angles in protein structures.J.Mol.Biol., 231,1049–1067.Luthy,R.,Bowie,J.U.and Eisenberg,D.(1992)Assessment of protein models with three-dimensional pro?les.Nature,356,83–

85.

Martin,A.C.,MacArthur,M.W.and Thornton,J.M.(1997)Assess-ment of comparative modeling in CASP2.Proteins,(Suppl1), 14–28.

Melo,F.and Feytmans,E.(1997)Novel Knowledge-based Mean Force Potential at Atomic Level.J.Mol.Biol.,267,207–222. Morgenstern,B.(1999)DIALIGN2:improvement of the segment-to-segment approach to multiple sequence alignment.Bioinfor-matics,15,211–218.

Mosimann,S.,Meleshko,R.and James,M.N.(1995)A critical assessment of comparative molecular modeling of tertiary structures of proteins.Proteins,23,301–317.

Neuwald,A.F.,Liu,J.S.and Lawrence,C.E.(1995)Gibbs motif sampling:detection of bacterial outer membrane protein repeats.

Protein Sci.,4,1618–1632.

Rost,B.and O’Donoghue,S.(1997)Sisyphus and prediction of protein https://www.sodocs.net/doc/2b862219.html,put.Appl.Biosci.,13,345–356.

Sali,A.and Blundell,T.L.(1993)Comparative protein modelling by satisfaction of spatial restraints.J.Mol.Biol.,234,779–815. Sali,A.,Sanchez,R.and Badretdinov,A.(1997)MODELLER:A Program for Protein Structure Modeling Release4. Sanchez,R.and Sali,A.(1997)Advances in comparative protein-structure modeling.Curr.Opin.Struct.Biol.,7,206–214. Thompson,J.D.,Higgins,D.G.and Gibson,T.J.(1994)CLUSTALw: improving the sensitivity of progressive multiple sequence alignment through sequence weighting,position-speci?c gap penalties and weight matrix choice.Nucleic Acids Res.,22, 4673–4680.

Thompson,J.D.,Plewniak,F.and Poch,O.(1999)A comprehensive comparison of multiple sequence alignment programs.Nucleic Acids Res.,27,2682–2690.

Tramontano,A.(1998)Homology modeling with low sequence identity.Methods,14,293–300.

Venclovas,C.,Zemla,A.,Fidelis,K.and Moult,J.(1999)Some measures of comparative performance in the three CASPs.

Proteins,(Suppl3),231–237.

Yang,A.S.and Honig,B.(1999)Sequence to structure alignment in comparative modeling using PrISM.Proteins,(Suppl3),66–72. Zemla,A.(2000)LGA program:A Method for Finding3D Similar-ities in Protein Structures.Accessed at http://PredictionCenter.

https://www.sodocs.net/doc/2b862219.html,/local/lga.

1256

用DiscoverStuido所提供的蛋白质三级结构预测之同源模拟

(一)摘要: 利用Discover Stuido所提供的蛋白質三級結構預測之同源模擬(Homology Modeling)的方法,可以很快速及有效率的獲得離子通道蛋白之三維結構,再深入探討分析其結構所提供的資料,進一步了解離子通道蛋白的功能,可用來指引生物實驗,例如site - directed mutagenesis、rational drug design 和protein - protein interaction 等等。模擬之後的結構再透過Discover Stuido所提供之LigandFit、LibDock 以及CDOCKER等計算方法,進行藥物與通道蛋白的對接,再經過Discover Stuido生物軟體計算預測藥物可能在通道上可能的結合位置以及可能相互作用的關鍵胺基酸。 (二)研究動機: 細胞膜上的離子通道是所有生命體的基本要件,很多疾病,例如一些神經系統疾病和心血管疾病就是由於細胞上的離子通道功能發生紊亂或蛋白結構的缺陷所造成。例如本實驗室研究項目之ㄧ的離子通道「NMDA受器」,在哺乳類動物中樞神經系統中扮演重要角色,從興奮性突觸的訊息傳遞、突觸的可塑性、到學習與記憶的整合此通道受器都參予其中,許多神經性疾病如急性腦中風或慢性的阿茲海默症、帕金森氏症、精神分裂症、癲癇等等都認為和NMDA受器有關,因此增加對它們的認識是幫助了解許多疾病狀態的重要基礎,目前許

多離子通道已成為製藥界開發新藥的目標,離子通道的研究還有非常大的潛力和應用空間。 由於生物資訊近幾年來發展迅速,尤其基因體計劃的進行更增加了資料庫中的數量,包括核酸、蛋白質及結構等資料庫。一般說來,蛋白質三維結構主要以實驗方法來決定,例如X-射線繞射法或NMR光譜法。事實上,技術上的困難使得蛋白質三維結構決定的速度相當地緩慢,因此發展出利用電腦並依據蛋白質的序列來預測其三級結構的方法。這些方法包含以物理化學原理的ab-initio methods 及以資料庫提供的序列和結構知識衍生的蛋白質摺疊認識fold- recognition methods(亦稱threading 穿針引線法),和同源模擬法(homology modeling)。 我們想透過Discover Stuido中的homology modeling方法來預測離子通道蛋白經胺基酸點突變之後可能的構型變化,以及預測相關藥物與通道蛋白之間可能的作用位置,並且可以有效率的篩選出可能作用的關鍵胺基酸,再透過電生理的方式來探測藥物對離子通道的行為模式,有助於進一步且有效率的了解離子通道可能的分子機制。(三)研究方法與步驟: 同源模擬法五個主要的步驟: (1)資料庫搜尋及選擇模版(templates)

蛋白质结构与功能的关系

蛋白质结构与功能的关系 蛋白质的结构包括一级结构、二级结构、三级结构、四级结构。 一级结构是蛋白质的一级结构指在蛋白质分子从N-端至C-端的氨基酸排列顺序。一级结构是蛋白质空间构象和特异生物学功能的基础,但不是决定蛋白质空间构象的唯一因素。 蛋白质的二级结构是指多肽链的主链骨架本身在空间上有规律的折叠和盘绕,它是由氨基酸残基非侧链基团之间的氢键决定的。常见的二级结构有α螺旋、三股螺旋、β折叠、β转角、β凸起和无规卷曲。α螺旋中肽链骨架围绕一个轴以螺旋的方式伸展,它可能是极性的、疏水的或两亲的。β折叠是肽链的一种相当伸展的结构,有平行和反平行两种。如果β股交替出现极性残基和非极性残基,那么就可以形成两亲的β折叠。β转角指伸展的肽链形成180°的U形回折结构而改变了肽链的方向。β凸起是由于β折叠股中额外插入一个氨基酸残基而形成的,它也能改变多肽链的走向。无规卷曲是在蛋白质分子中的一些极不规则的二级结构的总称。无规卷曲无固定走向,有时以环的形式存在,但不是任意变动的。从结构的稳定性上看,右手α螺旋>β折叠> U型回折>无规卷曲,但在功能上,酶与蛋白质的活性中心通常由无规卷曲充当,α右手螺旋和β折叠一般只起支持作用。 蛋白质的三级结构是指多肽链在二级结构的基础上,进一步盘绕、卷曲和折叠,形成主要通过氨基酸侧链以次级键以及二硫键维系的完整的三维结构。三级结构通常由模体和结构域组成。稳定三级结构的化学键包括氢键、疏水键、离子键、范德华力、金属配位键和二硫键。模体可用在一级结构上,特指具有特殊生化功能的序列模体,也可被用于功能模体或结构模体,相当于超二级结构。结构模体是结构域的组分,基本形式有αα、βαβ和βββ等。常见的模体包括:左手超螺旋、右手超螺旋、卷曲螺旋、螺旋束、α螺旋-环-α螺旋、Rossmann卷曲和希腊钥匙模体。结构域是在一个蛋白质分子内的相对独立的球状结构和/或功能模块,由若干个结构模体组成的相对独立的球形结构单位,它们通常是独自折叠形成的,与蛋白质的功能直接相关。一个结构域通常由一段连续的氨基酸序列组成。根据其占优势的二级结构元件的类型,结构域可分为五大类:α结构域、β结构域、α/β结构域、α+β 结构域、交联结构域。以上每一类结构域的二级结构元件可能有不同的组织方式,每一种组织就是一种结构模体。这些结构域都有疏水的核心,疏水核心是结构域稳定所必需的。 具有两条和两条以上多肽链的寡聚蛋白质或多聚蛋白质才会有四级结构。组成寡聚蛋白质或多聚蛋白质的每一个亚基都有自己的三级结构。蛋白质的四级结构内容包括亚基的种类、数目、空间排布以及亚基之间的相互作用。驱动四级结构形成或稳定四级结构的作用力包括

ESyPred3D蛋白质三级结构预测

BIOINFORMATICS Vol.18no.92002Pages 1250–1256 ESyPred3D:Prediction of proteins 3D structures Christophe Lambert ?,Nadia L′eonard,Xavier De Bolle and Eric Depiereux Facult′es Universitaires Notre-Dame de la Paix,Unit′e de Recherche en Biologie Mol′eculaire,Rue de Bruxelles,61,B-5000Namur,Belgium Received on February 1,2002;revised on March 7,2002;accepted on March 18,2002 ABSTRACT Motivation:Homology or comparative modeling is cur-rently the most accurate method to predict the three-dimensional structure of proteins.It generally consists in four steps:(1)databanks searching to identify the struc-tural homolog,(2)target–template alignment,(3)model building and optimization,and (4)model evaluation.The target–template alignment step is generally accepted as the most critical step in homology modeling. Results:We present here ESyPred3D,a new automated homology modeling program.The method gets bene?t of the increased alignment performances of a new alignment strategy.Alignments are obtained by combining,weighting and screening the results of several multiple alignment programs.The ?nal three-dimensional structure is build using the modeling package MODELLER.ESyPred3D was tested on 13targets in the CASP4experiment (C ritical A ssessment of Techniques for Proteins S tructural P rediction).Our alignment strategy obtains better results compared to PSI-BLAST alignments and ESyPred3D alignments are among the most accurate compared to those of participants having used the same template. Availability:ESyPred3D is available through its web site at http://www.fundp.ac.be/urbm/bioinfo/esypred/. Contact:https://www.sodocs.net/doc/2b862219.html,mbert@fundp.ac.be;http://www.fundp.ac.be/~lambertc INTRODUCTION Three-dimensional (3D)protein structure is an important source of information to better understand the function of a protein,its interactions with other compounds (ligands,proteins,DNA,...)and to understand phenotypical effects of mutations (Tramontano,1998).The 3D protein struc-ture can be predicted according to three main categories of methods (Rost and O’Donoghue,1997):(1)homology or comparative modeling (described below);(2)fold recogni-tion (predicting the global fold of a protein);(3)ab initio techniques (trying to model the 3D structure of proteins using only the sequence and a force ?eld). ?To whom correspondence should be addressed. Homology modeling is historically the ?rst (Browne et al.,1969)and the most accurate method (Sanchez and Sali,1997).It was shown during the last CASP experiment (Venclovas et al.,1999)(C ritical A ssessment of Techniques for Proteins S tructural P rediction)that the critical steps are:(1)template selection,(2)target–template alignment step,(3)modeling of regions not present or signi?cantly different from those in template and (4)modeling of side chains.Among these critical steps,it is commonly accepted that the target–template alignment step is the most critical (Mosimann et al.,1995;Martin et al.,1997). It is known that above 50%of identity rate between target and template,pairwise alignments provide accurate models.Between 30%and 50%of identity,multiple align-ments between target,template and similar proteins must be used and the pairwise alignments between target and template must be extracted from this multiple alignment.Below 30%of identity rate,only heuristic combinations of multiple alignments,experimental data and know-how of an expert are able to generate an accurate model. A large number of techniques have been developed to predict 3D structures of proteins by homology modeling.For the target–template alignment step,most of them use PSI-BLAST (Altschul et al.,1997),PileUp (Wisconsin Package Version 9.1,Genetics Computer Group (GCG),Madison,Wisc.),ClustalW (Thompson et al.,1994),3D-PSSM (Fischer et al.,1999),SAMT99(Karplus et al.,1997),or also the alignment producing the best model out of a collection computed from various alignment programs (Yang and Honig,1999). Our laboratory developed the MATCHBOX multiple sequence alignment software in the early 1990s (De-piereux and Feytmans,1992)and it has proved to be one of the most accurate in terms of speci?city (Depiereux et al.,1997).Much effort has been consented into im-proving alignment accuracy by adding information such as secondary structure predictions,solvent accessibility predictions,speci?c scoring matrices and combination with ClustalW.In all cases,it was only possible to slightly improve multiple alignment accuracy (unpublished re-1250 c Oxford University Press 2002

蛋白质的一级结构(共价结构)

1.蛋白质的一级结构(共价结构) 蛋白质的一级结构也称共价结构、主链结构。 1.蛋白质结构层次 一级结构(氨基酸顺序、共价结构、主链结构) ↓是指蛋白质分子中氨基酸残基的排列顺序 二级结构 ↓ 超二级结构 ↓ 构象(高级结构)结构域 ↓ 三级结构(球状结构) ↓ 四级结构(多亚基聚集体) 1.一级结构的要点 . 1.蛋白质测序的一般步骤 祥见 P116 (1)测定蛋白质分子中多肽链的数目。 (2)拆分蛋白质分子中的多肽链。 (3)测定多肽链的氨基酸组成。 (4)断裂链内二硫键。 (5)分析多肽链的N末端和C末端。 (6)多肽链部分裂解成肽段。 (7)测定各个肽段的氨基酸顺序 (8)确定肽段在多肽链中的顺序。 (9)确定多肽链中二硫键的位置。 1.蛋白质测序的基本策略 对于一个纯蛋白质,理想方法是从N端直接测至C端,但目前只能测60个N端氨基酸。 1.直接法(测蛋白质的序列) 两种以上特异性裂解法 N C A 法裂解 A1 A2 A3 A4 B 法裂解 B1 B2 B3 B4 用两种不同的裂解方法,产生两组切点不同的肽段,分离纯化每一个肽段,分离测定两个肽段的氨基酸序列,拼接成一条完整的肽链。

1. 间接法(测核酸序列推断氨基酸序列) 核酸测序,一次可测600-800bp 1. 测序前的准备工作 1. 蛋白质的纯度鉴定 纯度要求,97%以上,且均一,纯度鉴定方法。(两种以上才可靠) ⑴聚丙烯酰胺凝胶电泳(PAGE)要求一条带 ⑵DNS —cl (二甲氨基萘磺酰氯)法测N 端氨基酸 1. 测定分子量 用于估算氨基酸残基n= 方法:凝胶过滤法、沉降系数法 1. 确定亚基种类及数目 多亚基蛋白的亚基间有两种结合方式: ⑴非共价键结合 8mol/L 尿素,SDS SDS-PAGE 测分子量 ⑵二硫键结合 过甲酸氧化: —S —S —+HCOOOH → SO 3H β巯基乙醇还原: 举例:: 血红蛋白 (α2β2) (注意,人的血红蛋白α和β的N 端相同。) 分子量: M 拆亚基: M 1 、M 2 两条带 拆二硫键: M 1 、M 2 两条带 分子量关系: M = 2M 1 + 2M 2 1. 测定氨基酸组成 主要是酸水解,同时辅以碱水解。氨基酸分析仪自动进行。 确定肽链中各种a.a 出现的频率,便于选择裂解方法及试剂。 ①Trp 测定 对二甲基氨基苯甲醛 590nm 。 ②Cys 测定 5、5/一二硫代双(—2—硝基苯甲酸)DTNB ,412nm 1. 端基分析 ①N 端分析 DNS-cl 法:最常用,黄色荧光,灵敏度极高,DNS-多肽水解后的DNS-氨基酸不需要提取。 DNFB 法:Sanger 试剂,DNP-多肽,酸水解,黄色DNP-氨基酸,有机溶剂(乙酸乙酯) 抽提分离,纸层析、薄层层析、液相等 PITC 法:Edman 法,逐步切下。无色PTH-氨基酸,有机溶剂抽提,层析。 ②C 端分析 110mw

蛋白质结构预测在线软件

蛋白质预测在线分析常用软件推荐 蛋白质预测分析网址集锦 物理性质预测: Compute PI/MW http://expaxy.hcuge.ch/ch2d/pi-tool.html Peptidemasshttp://expaxy.hcuge.ch/sprot/peptide-mass.html TGREASE ftp://https://www.sodocs.net/doc/2b862219.html,/pub/fasta/ SAPS http://ulrec3.unil.ch/software/SAPS_form.html 基于组成的蛋白质识别预测 AACompIdent http://expaxy.hcuge.ch ... htmlAACompSim http://expaxy.hcuge.ch/ch2d/aacsim.html PROPSEARCH http://www.e mbl-heidelberg.de/prs.html 二级结构和折叠类预测 nnpredict https://www.sodocs.net/doc/2b862219.html,/~nomi/nnpredict Predictprotein http://www.embl-heidel ... protein/SOPMA http://www.ibcp.fr/predict.html SSPRED http://www.embl-heidel ... prd_info.html 特殊结构或结构预测 COILS http://ulrec3.unil.ch/ ... ILS_form.html MacStripe https://www.sodocs.net/doc/2b862219.html,/ ... acstripe.html 与核酸序列一样,蛋白质序列的检索往往是进行相关分析的第一步,由于数据库和网络技校术的发展,蛋白序列的检索是十分方便,将蛋白质序列数据库下载到本地检索和通过国际互联网进行检索均是可行的。 由NCBI检索蛋白质序列 可联网到:“http://www.ncbi.nlm.ni ... gi?db=protein”进行检索。 利用SRS系统从EMBL检索蛋白质序列 联网到:https://www.sodocs.net/doc/2b862219.html,/”,可利用EMBL的SRS系统进行蛋白质序列的检索。 通过EMAIL进行序列检索 当网络不是很畅通时或并不急于得到较多数量的蛋白质序列时,可采用EMAIL方式进行序列检索。 蛋白质基本性质分析 蛋白质序列的基本性质分析是蛋白质序列分析的基本方面,一般包括蛋白质的氨基酸组成,分子质量,等电点,亲水性,和疏水性、信号肽,跨膜区及结构功能域的分析等到。蛋白质的很多功能特征可直接由分析其序列而获得。例如,疏水性图谱可通知来预测跨膜螺旋。同时,也有很多短片段被细胞用来将目的蛋白质向特定细胞器进行转移的靶标(其中最典型的

蛋白质结构预测和序列分析软件

蛋白质结构预测和序列分析软件 2010-05-08 20:40 转载自布丁布果 最终编辑布丁布果 4月18日 蛋白质数据库及蛋白质序列分析 第一节、蛋白质数据库介绍 一、蛋白质一级数据库 1、 SWISS-PROT 数据库 SWISS-PROT和PIR是国际上二个主要的蛋白质序列数据库,目前这二个数据库在EMBL和GenBank数据库上均建立了镜像 (mirror) 站点。 SWISS-PROT数据库包括了从EMBL翻译而来的蛋白质序列,这些序列经过检验和注释。该数据库主要由日内瓦大学医学生物化学系和欧洲生物信息学研究所(EBI)合作维护。SWISS-PROT 的序列数量呈直线增长。2、TrEMBL数据库: SWISS-PROT的数据存在一个滞后问题,即把EMBL的DNA序列准确地翻译成蛋白质序列并进行注释需要时间。一大批含有开放阅读框(ORF) 的DNA序列尚未列入SWISS-PROT。为了解决这一问题,TrEMBL(Translated EMBL) 数据库被建立了起来。TrEMBL也是一个蛋白质数据库,它包括了所有EMBL库中的蛋白质编码区序列,提供了一个非常全面的蛋白质序列数据源,但这势必导致其注释质量的下降。 3、PIR数据库: PIR数据库的数据最初是由美国国家生物医学研究基金会(National Biomedical Research Foundation, NBRF)收集的蛋白质序列,主要翻译自GenBank的DNA序列。 1988年,美国的NBRF、日本的JIPID(the Japanese International Protein Sequence Database 日本国家蛋白质信息数据库)、德国的MIPS(Munich Information Centre for Protein Sequences摹尼黑蛋白质序列信息中心)合作,共同收集和维护PIR数据库。PIR根据注释程度(质量)分为4个等级。4、 ExPASy数据库: 目前,瑞士生物信息学研究所(Swiss Institute of Bioinformatics, SIB)创建了蛋白质分析专家系统(Expert protein analysis system, ExPASy )。涵盖了上述所有的数据库。网址:https://www.sodocs.net/doc/2b862219.html, 我国的北京大学生物信息中心(https://www.sodocs.net/doc/2b862219.html,) 设立了ExPASy的镜像(Mirror)。 主要蛋白质序列数据库的网址 SWISS-PROT https://www.sodocs.net/doc/2b862219.html,/sprot 或 https://www.sodocs.net/doc/2b862219.html,/expasy_urls.html TrEMBL https://www.sodocs.net/doc/2b862219.html,/sprot PIR https://www.sodocs.net/doc/2b862219.html,/pirwww MIPS——Munich Information Centre for Protein Sequences http://mips.gsf.de/ JIPID——the Japanese International Protein Sequence Database 已经和PIR合并 ExPASy https://www.sodocs.net/doc/2b862219.html, 二、蛋白质结构数据库 1、PDB数据库:

生物信息学 实验六 蛋白质高级结构预测

实验六:生物信息学试验学习总结 1使用SWISS-MODEL 进行蛋白质三维结构预测,PyMOL 查看分子结构1.1蛋白质三维结构预测-SWISS-MODEL 1.1.1全自动建模 以MAP kinase Pmk1 [Schizosaccharomyces pombe] (NP_595289.1)为target 序列,首先收缩模板序列,选择四个模板序列进行同源建模,并对得到的模型进行评估 1.1.2比对模式建模

模型评估: GMQE(全局模型质量评估)是一种质量评估,它结合了目标-模板对齐和模板搜索方法的属性。由此产生的GMQE分数表示为0到1之间的数字,反映了用该校准和模板构建的模型的预期精度。较高的数字表明更高的可靠性。一旦建立了模型,在这个特定的情况下,GMQE(1)就会得到更新,同时考虑到获得的模型的q 平均值,从而提高质量评估的可靠性。 QMEAN QMEAN平均数(Benkert等)是基于不同几何属性的复合得分函数,并提供了全局的(即:对于整个结构)和局部(即每个残余物)绝对质量的估计是基于一个单 一模型的。 z分数(2)提供了对模型中观察到的结构特征的“本土程度”的估计,并指出该模型是否具有与实验结构相似的质量。较高的q均值z分数表明模型结构与相似尺寸的实验结构之间的一致性较好。得分在0-4.0或以下的是一个质量很低的模型,这一点也可以通过在分数旁边的“拇指向下”符号的变化来突出显示。 QMEAN由四个单独的术语组成。全球q平均值质量分数的四个单独术语也列在上面。在巴图的白色区域(数值接近于零)表明这一特性与实验结构中所观察到的相似。实证值表明,该模型平均得分高于实验结构,负数表明该模型平均得分低于实验结构。q均值z分数本身显示在顶部。单独的z分数比较了Cbeta原子之间的相互作用势,所有的原子,溶解势和扭转角度的潜力。 “局部质量”图显示了模型的每一个剩余部分(在x轴上报告),期望与本机结构(y轴)的相似性。通常情况下,分数低于0.6的残留物被认为是低质量的。不同的模型链以不同的颜色显示。如果下载了模型,则在PDB文件的b-factor 列中报告了本地的分数。通过选择色彩方案“q吝啬”,可以直观地看到当地的质量。 在比较图中,模型的质量分数被表示为“z分数”,与高分辨率晶体结构获得的分数相比。x轴表示蛋白质的长度(氨基酸)。y轴是标准化的q均值分数。每一个点代表一个蛋白质结构。最暗的点是所有具有全局q均值z分数(上图中2和3)的结构,在1到1之间,z分数在1到2之间的结构是灰色的,如果z分数大于2,它们是浅灰色的。红色的星星代表这个模型。

蛋白质结构预测和序列分析软件

蛋白质结构预测和序列分析软件蛋白质数据库及蛋白质序列分析 第一节、蛋白质数据库介绍 一、蛋白质一级数据库 1、 SWISS-PROT 数据库 SWISS-PROT和PIR是国际上二个主要的蛋白质序列数据 库,目前这二个数据库在EMBL和GenBank数据库上均建 立了镜像 (mirror) 站点。 SWISS-PROT数据库包括了从EMBL翻译而来的蛋白质序 列,这些序列经过检验和注释。该数据库主要由日内瓦大 学医学生物化学系和欧洲生物信息学研究所(EBI)合作维 护。SWISS-PROT的序列数量呈直线增长。 2、TrEMBL数据库: SWISS-PROT的数据存在一个滞后问题,即 进行注释需要时间。一大批含有开放阅读 了解决这一问题,TrEMBL(Translated E 白质数据库,它包括了所有EMBL库中的 质序列数据源,但这势必导致其注释质量 3、PIR数据库: PIR数据库的数据最初是由美国国家生物医学研究基金 会(National Biomedical Research Foundation, NBRF) 收集的蛋白质序列,主要翻译自GenBank的DNA序列。 1988年,美国的NBRF、日本的JIPID(the Japanese International Protein Sequence Database日本国家蛋 白质信息数据库)、德国的MIPS(Munich Information Centre for Protein Sequences摹尼黑蛋白质序列信息 中心)合作,共同收集和维护PIR数据库。PIR根据注释 程度(质量)分为4个等级。 4、 ExPASy数据库: 目前,瑞士生物信息学研究所(Swiss I 质分析专家系统(Expert protein anal 据库。 网址:https://www.sodocs.net/doc/2b862219.html, 我国的北京大学生物信息中心(www.cbi.

蛋白质一级结构与高级结构关系

蛋白质一级结构与高级结构关系 蛋白质分子是由氨基酸首尾相连而成的共价多肽链,天然蛋白质分子有自己特有的空间结构,称为蛋白质构象。 蛋白质结构的不同组织层次:一级结构指多肽链的氨基酸序列。二级结构是指多肽链借助氢键排列成特有的α螺旋和β折叠片段。三级结构是指多肽链借助各种非共价键弯曲、折叠成具有特定走向的紧密球状构象。球状构象给出最低的表面积和体积之比,因而使蛋白质与周围环境的相互作用降到最小。四级结构是指寡居蛋白质中各亚基之间在空间上的相互关系和结合方式。二、三、四级结构为蛋白质的高级结构。蛋白质的天然折叠结构决定于3个因素:1。与溶剂分子(一般是水)的相互作用。2。溶剂的PH值和离子组成。3。蛋白质的氨基酸序列。后一个是最重要的因素。 (一)蛋白质折叠的热力学假说 蛋白质的高级结构由其一级结构决定的学说最初由Christian B. Anfinsen于1954年提出。在1950年之前,Anfinsen一直从事蛋白质结构方面的研究。在进入美国国立卫生研究所(NIH)以后,继续从事这方面的研究。Anfinsen和两个博士后Michael Sela、 Fred White在研究中发现,使用高浓度的巯基试剂——β- 巯基乙醇(β- mercaptoethanol)可将二硫键还原成自由的巯基,如果再加入尿素,进一步破坏已被还原的核糖核酸酶分子内部的次级键,则该酶将去折叠转变成无任何活性的无规卷曲。对还原的核糖核酸酶的物理性质进行分析的结果清楚地表明了它的确采取的是无规卷曲的形状。 在成功得到一种去折叠的核糖核酸酶以后,Anfinsen 着手开始研究它的重折叠过程。考虑到被还原的核糖核酸酶要在已被还原的8个Cys残基上重建4对二硫键共有105 种不同的组合,但只有一种是正确的形式,如果决定蛋白质构象的信息一直存在于氨基酸序列之中,那么,最后重折叠得到的总是那种正确的形式。否则,重折叠将是随机的,最后只能得到少量的正确形式。Anfinsen 的重折叠实验还是比较顺利的,他通过透析的方法除去了导致酶去折叠的尿素和巯基乙醇,再将没有活性的酶转移到其生理缓冲溶液之中,在有氧气的情况下于室温放置,以使巯基能重新氧化成二硫键。经过一段时间以后,发现核糖核酸酶活性得以恢复,这意味着它原来的构象恢复了。由于上述过程没有细胞内任何其他成分的参与,完全是一种自发的过程,因此,有理由相信此蛋白质正确折叠所需要的所有信息全部存在于它的一级结构之中。在此基础上,Anfinsen提出了蛋白质折叠的热力学假说(thermodynamic hypothesis)。根据此假说,一个蛋白质的天然三维构象对应于在生理条件下其所处的热力学最稳定的状态。热力学稳定性由组成的氨基酸残基之间的相互作用决定,于是蛋白质的三维构象直接由它的一级结构决定。 (二)蛋白质高级结构对高级结构形成的影响

蛋白质三维结构预测和结果分析

蛋白质三维结构预测和结果分析 姓名________ 学号______________ 组号_____日期________年___月___日 1.结构预测基本概念 1)参阅王吉龙2007文章,简述蛋白质三维空间结构同源模建的基本原理和步骤。 2)参阅文献(Kelley et al., 2015),说明利用Phyre2进行蛋白质结构预测的原理、方法和 结果分析。 2.癌胚抗原CEAM5_HUMAN结构预测和分析实例 1)搜索网络信息资源,简述癌胚抗原(Carcinoembryonic antigen, CEA)的研究和应 用背景。查看UniProt/Swiss-Prot数据库中人癌胚抗原CEAM5_HUMAN序列注释 信息和相关链接,说明该蛋白质分子的序列特征、功能、组织特异表达。 2)从PDB数据库中下载癌胚抗原CEAM5_HUMAN分子N端结构域二聚体晶体结构 2QSQ,选取A链并保存为2QSQ_A;从PDB数据库中下载免疫球蛋白抗体分子 高可变结构域1REI,选取A链并保存为1REI_A;比较1REI_A和2QSQ_A折叠 方式和二级结构,比较它们疏水内核的相同和差异。 3)根据文献(Bates et al, 1992)图1序列比对,找出2QSQ中与1REI_A中2个半胱 氨酸和1个色氨酸对应的3个残基,以此为基础进行结构叠合,计算均方根误差。 4)查看2QSQ_A分子表面与抗体结合区域三个回环,说明如何利用结构模拟信息为 下一步实验提供参考信息。 3.癌胚抗原CEA21_HUMAN结构预测和分析实例 1)检索UniProt/Swiss-Prot数据库,找出不同亚型CEA序列条目,根据注释信息,参阅 CEA专门网站(http://www.carcinoembryonic-antigen.de/),比较不同亚型CEA结构 域分布特征。 2)浏览UniProt数据库中收录的CEA21_HUMAN注释信息,找出其中恒定结构域。 3)利用Phyre2蛋白质结构预测网站,预测CEA21_HUMAN恒定结构域三维结构。 4)利用Investigator工具,对预测结果进行深入分析,比较不同位点的保守性和突变敏 感性,并与蛋白质预测网站PredictProtein所得结果进行比较。 4.课题相关蛋白质结构预测 1)以本人研究课题相关蛋白质或该蛋白质在其它物种中的同源蛋白为例,利用 Swiss-PDBViewer进行分析。 2)选取本人研究课题相关蛋白质或其中的某个结构域,以Phyre2网站进行结构预测, 并利用Investigator工具,对预测结果进行深入分析,比较不同位点的保守性和突变 敏感性。 参考文献 1.王吉龙,《癌胚抗原CEA三维空间结构同源模建》,2007, (hyttp://https://www.sodocs.net/doc/2b862219.html,/reference/wang-jilong-cea.pdf). 2.Bates PA, Luo J, Sternberg MJ. A predicted three-dimensional structure for the

蛋白质结构预测方法综述

蛋白质结构预测方法综述 卜东波陈翔王志勇 《计算机不能做什么?》是一本好书,其中文版序言也堪称佳构。在这篇十余页的短文中,马希文教授总结了使用计算机解决实际问题的三步曲,即首先进行形式化,将领域相关的实际问题抽象转化成一个数学问题;然后分析问题的可计算性;最后进行算法设计,分析算法的时间和空间复杂度,寻找最优算法。 蛋白质空间结构预测是很有生物学意义的问题,迄今亦有很多的工作。有意思的是,其中一些典型工作恰恰是上述三步曲的绝好示例,本文即沿着这一路线作一总结,介绍于后。 1 背景知识 生物细胞种有许多蛋白质(由20余种氨基酸所形成的长链),这些大分子对于完成生物功能是至关重要的。蛋白质的空间结构往往决定了其功能,因此,如何揭示蛋白质的结构是非常重要的工作。 生物学界常常将蛋白质的结构分为4个层次:一级结构,也就是组成蛋白质的氨基酸序列;二级结构,即骨架原子间的相互作用形成的局部结构,比如alpha螺旋,beta片层和loop区等;三级结构,即二级结构在更大范围内的堆积形成的空间结构;四级结构主要描述不同亚基之间的相互作用。 经过多年努力,结构测定的实验方法得到了很好的发展,比较常用的有核磁共振和X光晶体衍射两种。然而由于实验测定比较耗时和昂贵,对于某些不易结晶的蛋白质来说不适用。相比之下,测定蛋白质氨基酸序列则比较容易。因此如果能够从一级序列推断出空间结构则是非常有意义的工作。这也就是下面的蛋白质折叠问题: 1蛋白质折叠问题(Protein Folding Problem) 输入: 蛋白质的氨基酸序列

输出: 蛋白质的空间结构 蛋白质结构预测的可行性是有坚实依据的。因为一般而言,蛋白质的空间结构是由其一级结构确定的。生化实验表明:如果在体外无任何其他物质存在的条件下,使得蛋白质去折叠,然后复性,蛋白质将立刻重新折叠回原来的空间结构,整个过程在不到1秒种内即可完成。因此有理由认为对于大部分蛋白质而言,其空间结构信息已经完全蕴涵于氨基酸序列中。从物理学的角度讲,系统的稳定状态通常是能量最小的状态,这也是蛋白质预测工作的理论基础。 2 蛋白质结构预测方法 蛋白质结构预测的方法可以分为三种: 同源性(Homology )方法:这类方法的理论依据是如果两个蛋白质的序列比较相似,则其结构也有很大可能比较相似。有工作表明,如果序列相似性高于75%,则可以使用这种方法进行粗略的预测。这类方法的优点是准确度高,缺点是只能处理和模板库中蛋白质序列相似性较高的情况。 从头计算(Ab initio ) 方法:这类方法的依据是热力学理论,即求蛋白质能量最小的状态。生物学家和物理学家等认为从原理上讲这是影响蛋白质结构的本质因素。然而由于巨大的计算量,这种方法并不实用,目前只能计算几个氨基酸形成的结构。IBM 开发的Blue Gene 超级计算机,就是要解决这个问题。 穿线法(Threading )方法:由于Ab Initio 方法目前只有理论上的意义,Homology 方法受限于待求蛋白质必需和已知模板库中某个蛋白质有较高的序列相似性,对于其他大部分蛋白质来说,有必要寻求新的方法。Threading 就此应运而生。 以上三种方法中,Ab Initio 方法不依赖于已知结构,其余两种则需要已知结构的协助。通常将蛋白质序列和其真实三级结构组织成模板库,待预测三级结构的蛋白质序列,则称之为查询序列(query sequence)。 3 蛋白质结构预测的Threading 方法 Threading 方法有三个代表性的工作:Eisenburg 基于环境串的工作、Xu Ying 的Prospetor 和Xu Jinbo 、Li Ming 的RAPTOR 。 Threading 的方法:首先取出一条模版和查询序列作序列比对(Alignment),并将模版蛋白质与查询序列匹配上的残基的空间坐标赋给查询序列上相应的残基。比对的过程是在我们设计的一个能量函数指导下进行的。根据比对结果和得到的查询序列的空间坐标,通过我们设计的能量函数,得到一个能量值。将这个操作应用到所有的模版上,取能量值最低的那条模版产生的查询序列的空间坐标为我们的预测结果。 需要指出的是,此处的能量函数却不再是热力学意义上的能量函数。它实质上是概率的负对数,即 ,我们用统计意义上的能量来代替真实的分子能量,这两者有大致相同的形式。 p E log ?=如果沿着马希文教授的观点看上述工作 ,则更有意思:Eisenburg 指出如果仅仅停留在简单地使用每个原子的空间坐标(x,y,z)来形式化表示蛋白质空间结构,则难以进一步深入研究。Eisenburg 创造性地使用环境串表示结构,从而将结构预测问题转化成序列串和环境串之间的比对问题;其后,Xu Ying 作了进一步发展,将蛋白质序列表示成一系列核(core )组成的序列,Core 和Core 之间存在相互作用。因此结构就表示成Core 的空间坐标,以及Core 之间的相互作用。在这种表示方法的基础上,Xu Ying 开发了一种求最优匹配的动态规划算法,得到了很好的结果。但是由于其较高的复杂度,在Prospetor2上不得不作了一些简化;Xu Jinbo 和Li Ming 很漂亮地解决了这个问题,将求最优匹配的过程表示成一个整数规划问题,并且证明了一些常用

蛋白质一级结构与高级结构关系

蛋白质分子是由氨基酸首尾相连而成的共价多肽链,天然蛋白质分子有自己特有的空间结构,称为蛋白质构象。 蛋白质结构的不同组织层次:一级结构指多肽链的氨基酸序列。二级结构是指多肽链借助氢键排列成特有的α螺旋和β折叠片段。三级结构是指多肽链借助各种非共价键弯曲、折叠成具有特定走向的紧密球状构象。球状构象给出最低的表面积和体积之比,因而使蛋白质与周围环境的相互作用降到最小。四级结构是指寡居蛋白质中各亚基之间在空间上的相互关系和结合方式。二、三、四级结构为蛋白质的高级结构。蛋白质的天然折叠结构决定于3个因素:1。与溶剂分子(一般是水)的相互作用。2。溶剂的PH值和离子组成。3。蛋白质的氨基酸序列。后一个是最重要的因素。 (一)蛋白质折叠的热力学假说 蛋白质的高级结构由其一级结构决定的学说最初由Christian B. Anfinsen于1954年提出。在1950年之前,Anfinsen一直从事蛋白质结构方面的研究。在进入美国国立卫生研究所(NIH)以后,继续从事这方面的研究。Anfinsen和两个博士后Michael Sela、 Fred White在研究中发现,使用高浓度的巯基试剂——β- 巯基乙醇(β- mercaptoethanol)可将二硫键还原成自由的巯基,如果再加入尿素,进一步破坏已被还原的核糖核酸酶分子内部的次级键,则该酶将去折叠转变成无任何活性的无规卷曲。对还原的核糖核酸酶的物理性质进行分析的结果清楚地表明了它的确采取的是无规卷曲的形状。 在成功得到一种去折叠的核糖核酸酶以后,Anfinsen 着手开始研究它的重折叠过程。考虑到被还原的核糖核酸酶要在已被还原的8个Cys残基上重建4对二硫键共有105 种不同的组合,但只有一种是正确的形式,如果决定蛋白质构象的信息一直存在于氨基酸序列之中,那么,最后重折叠得到的总是那种正确的形式。否则,重折叠将是随机的,最后只能得到少量的正确形式。Anfinsen 的重折叠实验还是比较顺利的,他通过透析的方法除去了导致酶去折叠的尿素和巯基乙醇,再将没有活性的酶转移到其生理缓冲溶液之中,在有氧气的情况下于室温放置,以使巯基能重新氧化成二硫键。经过一段时间以后,发现核糖核酸酶活性得以恢复,这意味着它原来的构象恢复了。由于上述过程没有细胞内任何其他成分的参与,完全是一种自发的过程,因此,有理由相信此蛋白质正确折叠所需要的所有信息全部存在于它的一级结构之中。在此基础上,Anfinsen提出了蛋白质折叠的热力学假说(thermodynamic hypothesis)。根据此假说,一个蛋白质的天然三维构象对应于在生理条件下其所处的热力学最稳定的状态。热力学稳定性由组成的氨基酸残基之间的相互作用决定,于是蛋白质的三维构象直接由它的一级结构决定。 (二)蛋白质高级结构对高级结构形成的影响 1.二级结构 蛋白质的二级结构由氢键维持。包括α螺旋、β折叠、β转角和无规卷等。α螺旋是一种重复性结构,螺旋中每个α-碳的Φ和Ψ分别为-57o和-47o附近。

相关主题