| Introduction | Submitting your query | Understanding your results | Prediction Method |


Help

The ArchPRED server is designed to predict the structure of loop regions in protein structures based on a novel fragment-search based method. Given a query loop of unknown structure, ArchPRED identifies which loops of know structures are likely to share conformation similarity with the query loop.

The prediction algorithm includes several steps, namely SELECTION, FILTERING and RANKING.



STRUCTURE UPLOAD
  • Protein Structure file

    You need to upload a protein structure file containing the missing loop to predict. If the structure contains more than one chain, only the chain that includes the missing loop will be considered by the prediction algorithm.

    The format of the structure file MUST be Protein Data Bank format (PDB) (http://www.rcsb.org).

PREDICTION PARAMETERS
 
Loops are selected from the Search Space using:
  • End Points distance

    Template loops with a D(1) <= 1 Ang. are selected
  • Geometry

    The geometry defines the structural arrangement of the secondary structures that span a loop.

    Candidate loops with same geometrical bin that query loop are selected.

    Two loops belong to the same geometrical bin if :

    If both N-terminus and C-terminus secondary structures are beta strands, users will prompted to choose between beta hairpins or beta links
     
  • Loop start position

    First residues of missing loop according to the pdb file numbering.

    IMPORTANT! Although the loop is missing from the pdb file, users must ensure that numbering is consistent with the missing region. For instance, if the missing loop has a length of 6 residues between residues and the last N-terminus stem residue match to number 100 in the pdb file, the first C-terminus stem residue must be 107. For example:


  • Loop to predict starts at residue 17 (last know residues is 16) and the sequence of the missing loop is : GHLEDDVVVVVSSD (length 14); Chain A

    The pdb you upload should looks like:
    				...
    				ATOM    826  N   PHE A  14      49.206  19.282  50.320  1.00 54.69           N  
    				ATOM    827  CA  PHE A  14      48.606  19.107  51.603  1.00 55.79           C  
    				ATOM    828  C   PHE A  14      47.836  17.809  51.723  1.00 56.22           C  
    				ATOM    829  O   PHE A  14      48.078  17.051  52.638  1.00 55.30           O  
    				ATOM    830  CB  PHE A  14      47.690  20.257  51.882  1.00 56.82           C 
    				ATOM    837  N   LEU A  15      46.893  17.551  50.827  1.00 57.29           N  
    				ATOM    838  CA  LEU A  15      46.182  16.280  50.875  1.00 59.26           C  
    				ATOM    839  C   LEU A  15      47.047  15.013  50.740  1.00 59.98           C  
    				ATOM    840  O   LEU A  15      46.727  13.971  51.280  1.00 60.70           O  
    				ATOM    845  N   GLU A  16      48.121  15.088  49.997  1.00 60.70           N 
    				ATOM    846  CA  GLU A  16      48.980  13.935  49.837  1.00 61.99           C  
    				ATOM    847  C   GLU A  16      49.482  13.456  51.186  1.00 62.42           C  
    				ATOM    848  O   GLU A  16      49.617  12.256  51.424  1.00 62.45           O  
    				ATOM    946  N   VAL A  31      25.144  19.607  51.305  1.00 51.18           N  
    				ATOM    947  CA  VAL A  31      25.010  20.708  52.246  1.00 52.08           C  
    				ATOM    948  C   VAL A  31      23.603  20.835  52.761  1.00 52.94           C  
    				ATOM    949  O   VAL A  31      22.705  21.156  52.037  1.00 54.54           O  
    				ATOM    953  N   THR A  32      23.403  20.559  54.023  1.00 54.13           N  
    				ATOM    954  CA  THR A  32      22.075  20.604  54.610  1.00 55.49           C  
    				ATOM    955  C   THR A  32      21.767  21.988  55.149  1.00 56.13           C  
    				ATOM    956  O   THR A  32      20.606  22.329  55.356  1.00 56.55           O  
    				...
    				

    If the numbering is not consistent (i.e. after residue 16 there are residues with a residue numbers <= 30, coordinates will be deleted when the pdb file is parsed


  • Chain ID

    If not null, users must define chain id where missing loop is located.

  • Loop sequence

    User must provide the sequence of the query loop in one letter code.

  • Do you want to activate Clashes Filter?

    User can choose whether to use or not clashes filter during loop selection

  • # of predictions to include in output

    Number of predictions that user wants to keep.

  • Zscore cut-off

    User must select a Zscore threshold to keep a prediction. See below Method section.


POST-PREDICTION PARAMETERS
  • Side chain reconstruction

    The prediction method only provides the coordinates for the main-chain atoms of the missing loop. However, side chain can be rebuilt. For this purpose SCWRL3 program[1] is used.

  • Fragment minimization in new protein environment

    A limited energy minimization can be performed of predicted loop. The method used for minimization is the Conjugate Gradients without molecular dynamics embedded in MODELLER package[2]. The main purpose of energy minimization is to close the gaps at the stem residues but preserving the structure of the loop.
     

UNDERSTANDING YOUR RESULTS
 
The users will receive by e-mail a link to a results web page that includes:

Prediction parameters:

A brief description of the parameter used for prediction; among others: submitted pdb file, chain, sequence and starting residue of missing loop; number of prediction to keep; whether or not rebuild side chains; and whether or not minimize loop prediction.

Prediction process:

A log of the prediction process: -Number of selected loops: Number of template loops selected from Search Space either by geometry or ending point distance.
-Remaining loops after ranking: T Number of template loops after ranking. Template loops are ranked based on sequence and dihedral angle propensity and a Zscore is calculated using both measures. If the Zscore is smaller than the selected Zscore cut-off, these templates loops are removed.
-Remaining loops after first filter (RMSD stems): Number of template loops after first filter. Template loops with an RMSD > RMSD cut-off comparing the stem residues from query and template are removed.
-Remaining loops after second filter (clashes): Number of template loops after second filter. Templates loops are inserted in the new protein environment. If fitting produces clashes these loops are removed.


Prediction process itself is finished, the two remaining steps are post-prediction alterations.

-Remaining loops after side-chains building: SCRWL3 program might clash during side chain building.
If this happens, these predictions are removed.

-Remaining loops after energy minimization: Number of prediction that succeeded the minimization step.

Results: Links are provided to download the pdb files generated by the prediction.

NO RESULTS???

A number of warning messages are showed in case of no prediction are achieved. Errors can be:

1. Unable to connect: For some reason (temporary network failure, machine shutdown, etc) the server can not connect to the database. Please, try again later.

2. Something wrong with stem residues: User defined stem residues that donít exist in the pdb file. Example: Missing loop: start 123, length 6. If this errors is shown this means that one of this residues is missing: Nt stems: from 118 to 123; or Ct stems: from 129 to 133.

3. No selected loops that fulfill your query, (geometry): There is not a single loop in Search Space that share geometry definition with your query loop. Try selecting loop by ending-point distance.

4. No selected loops that fulfill your query, (ending points): There is not a single loop in Search Space with ending-point distance +/- 1 Ang. to your query loop.

5. No suitable loops after Zscore ranking: All templates loops have a Zscore smaller than the selected Zscore cut-off.

6. No suitable loops after RMSD stem filter: All template loops have a RMSD stem larger than RMSD stem cut-off.

7. No suitable loops after clashes filter: After inserting template loops in protein, all template loops have steric impediments.



METHOD
  • Search Space

    The Search Space is a multidimensional library of loops of know structures organized into three a three levels hierarchy:

    (i) at the top, loops are identified according to the type of the bracing secondary structures: αα loops, βα loops, αβ loops and ββ loops;

    (ii) at the next level, loops are grouped according to their length, and finally

    (iii) loops are grouped according to the geometry of bracing secondary structures.

    This geometry of a loop is defined by: a distance, D, and three angles:a hoist (δ), a packing (θ) and a meridian (ρ)[3] (Figure 1).
ALGORITHM

The prediction algorithm selects a set of candidate loops from the Search Space, then subsequently filters and ranks them by various criteria (Figure 1).
  • Selection

    The Search Space is queried by the length of the loop, the type of secondary structures that span the query loop and by the geometry of the motif. Loops with the same length (+/- 1 residue) that belong to the same geometrical bin are selected. Two loops (loop A and B with geometry GA=(DA,δAAA) and GB=(DB,δBBB), respectively), share the same geometry if the difference in geometry falls inside the semi-open interval [(0,0,0,0),(2,30,30,45)).
     
  • Filtering

    In the filtering step in the algorithm discards clearly unfavorable candidates by assessing the fit of stem regions and by steric fitting in the new protein framework. The terms of steric violations or clashes are computed among main chain atoms (N, C, Cα and O). Two atoms are in steric clash if their distance is smaller than the 70% of sum of the respective van der Waals radii.
     
  • Ranking

    The final set of candidate loops are ranked by two measures:

    (1) A sequence similarity score between the query and candidate loops using the conformation similarity weight matrix (K3 matrix)[3];

    (2) Φ/φ main chain dihedral angle propensities. The dihedral angle propensity score measures the compatibility of observed and expected dihedral angles of each residue of the candidate loop in the corresponding position of the query. Main chain conformation definitions and propensities are defined according to the p15 propensities table of D. Shortle[4] The two components of the scoring scheme, sequence and propensity, are combined into a composite Zscore.
REFERENCES
  • 1. A. A. Canutescu, A. A. Shelenkov, and R. L. Dunbrack, Jr. A graph theory algorithm for protein side-chain prediction. Protein Science 12, 2001-2014 (2003).
     
  • 2. A. Sali & T.L. Blundell. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815, (1993).
     
  • 3. B. Oliva, PA. Bates, E. Querol, FX. Aviles, MJ. Sternberg. An automated classification of the structure of protein loops. J. Mol. Biol. 266, 814-830 (1997).
     
  • 4. A.S. Kolaskar and U. Kulkarni-Kale. Sequence alignment approach to pick up conformationally similar protein fragments. Journal of Molecular Biology 223, 1053-1061 (1992).
     
  • 5. D. Shortle. Composite of local structure propensities: evidence for local encoding of long-range structure. Protein Science 11, 18-26 (2002).