Introduction to SWISS-MODEL Workspace
The SWISS-MODEL Workspace is a web-based integrated service dedicated
to protein structure homology modelling. It assists and guides the user
in building protein homology models at different levels of complexity.
Building a homology model comprises four main steps: identification of
structural template(s), alignment of target sequence and template structure(s),
model building, and model quality evaluation. These steps can be repeated
until a satisfying modelling result is achieved. Each of the four steps
requires specialized software and access to up-to-date protein sequence
and structure databases.
Protein sequence and structure databases
necessary for modelling are accessible from the workspace and are updated
in regular intervals. Software tools for template selection,
model building, and structure quality evaluation can be
invoked from within the workspace.
A personal working environment (workspace), where several modelling
projects can be carried out in parallel, is provided for each user.
This help file provides references and illustrate the use of the individuals
tools available from within the SWISS-MODEL Workspace.
A tutorial to facilitate the first steps of working with SWISS-MODEL Workspace
as a list of most frequently asked questions is provided here:
Tutorial
Please also take a look at the following published
[Protocol]
Workspace
The SWISS-MODEL Workspace provides a personal web-based
area for each user in which protein homology models can be built and the
results of completed modelling projects are stored and visualized.
In the workspace a list of the current modeling work units and their current
status is displayed: submitted (the job has been submitted to the
pipeline but still queuing), running (job is running and programs
are calculating), finished (job has been completed, final results
are available) or failed/stopped (if something went wrong during
the process).
Depending on the type of job the user has submitted a different tag will
be associated with a work unit: Template Identification for template
identification, Sequence Scanning for secondary structure and disorder
prediction and domain assignment, Structure Assessment for structure
quality assessement. And Modelling Automatic, Modelling Project,
Modelling Alignment respectively for automated, alignemnt or project
mode modeling requests.
After completion of the modelling procedure (~ a few minutes up to several
hours), the results are stored in the workspace and the user is notified
about the completion.The user can access the results output by clicking
on the work unit ID number.
The results are stored for one week on the server. The
remainig time before deletion of a given work unit is also displayed.
The user can decide to either delete a work unit or to prolonge its life
span by clicking on the corresponding link.
Beware: Each user can submit up to a maximum of 25 work units.
Domain assignment, Secondary Structure
and Disorder Prediction
Many proteins are modular and made up of several structurally distinct
domains, which often reflect evolutionary relationships and may correspond
to units of molecular function.The sensitivity and performance of profile-based
template search methods can often be improved when the template search
is performed on individual domains rather than the whole target sequence.
IprScan (see below) allows for protein domains and functional site prediction.
Protein disorder prediction measures and displays the propensity of protein
sequences to be ordered or disordered. The result can aid the assignment
of templates to a specific region of the target protein by complementing
the IprScan approach to globular domains and feature discovery.
Secondary structure prediction methods are especially useful when combined
with other types of analyses: e.g. in cases where only templates with
very low sequence homology can be detected by sequence-based search methods,
predicted secondary structure may help to decide if a putative template
shares structural features of the target protein.
InterPro Domain Scan
The member databases of InterPro
(Mulder et al.) allow for both the identification
of protein domains and the assignment of protein function. Using the InterPro
Domain Scan (IprScan, Zdobnov et al.), protein
domains and functional sites can be assigned to regions of a target sequence.
The following databases are currently part of the InterPro Domain scan
method:
HMMPfam: Pfam
is a large collection of multiple sequence alignments and hidden Markov
models covering many common protein domains and families.
HMMTigr: TIGRFAMs
is a collection of protein families, featuring curated multiple sequence
alignments, hidden Markov models (HMMs) and annotation, which provides
a tool for identifying functionally related proteins based on sequence
homology.
ProfileScan: PROSITE
is a database of protein families and domains. It consists of biologically
significant sites, patterns and profiles that help to reliably identify
to which known protein family (if any) a new sequence belongs. There are
a number of protein families as well as functional or structural domains
that cannot be detected using patterns (see below) due to their extreme
sequence divergence. The use of techniques based on weight matrices (also
known as profiles) allows the detection of such domains.
SuperFamily: SUPERFAMILY
is a library of profile hidden Markov models that represent all proteins
of known structure, based on SCOP.
BlastProDom: The ProDom
protein domain database consists of an automatic compilation of homologous
domains. Current versions of ProDom are built using a novel procedure
based on recursive PSI-BLAST searches. The ProDom database has been designed
as a tool to help analyze domain arrangements of proteins and protein
families.
FPrintScan: PRINTS
is a compendium of protein fingerprints. A fingerprint is a group of conserved
motifs used to characterise a protein family.
HMMSmart:SMART
(a Simple Modular Architecture Research Tool) allows the identification
and annotation of genetically mobile domains and the analysis of domain
architectures.
ScanRegExp:PROSITE
is a database of protein families and domains. It consists of biologically
significant sites, patterns and profiles that help to reliably identify
to which known protein family (if any) a new sequence belongs. Some biologically
significant amino acid patterns can be summarised in the form of regular
expressions.
The results of the InterPro domain mapping is displayed in combination
with the alignment to putative template structures, allowing the user
to identify template structures spanning one or more domains of the target
protein. For low homology templates, the IprScan functional site annotation
of the target sequence can be used to verify that putative templates share
essential functional features. The InterPro functional annotations for
individual template structures are accessible from the workspace as links
to the SMTL library and external resources..
PsiPred Secondary Structure Prediction
PSIPRED
is a method for protein secondary structure prediction (
Jones DT et al.).
The plot shows position in the sequence against probability of being be
part of a alpha helix (H) , extended beta strand (E) or a coil region
(C). The result of the prediction is plottet on the x-axis of the plot.
DISOPRED Disorder Prediction
DISOPRED
(v 2) is a neural-network based predictor of
disordered regions in proteins (Jones DT et al.).
The majority of water-soluble proteins have structures that are globular
and relatively static. However, some proteins have regions that are natively
disordered. Disordered regions are flexible, dynamic and can be partially
or completely extended in solution. Native disorder also exists in global
structures such as extended random coil proteins with negligible secondary
structure or molten globules, which have regular secondary structure elements
but have not condensed into a stable globular fold.The
primary function of disorder appears to be molecular recognition of proteins
and nucleic acids. It has been speculated that the multiple metastable
conformations, adopted by disordered binding sites, allows recognition
of several targets with high specificity and low affinity. Order to disorder
transitions also provide a mechanism for controlling protein concentration
via proteolytic degradation.
The plot shows position in the sequence against probability of being disordered
(from 0 to 1). The 'filter' curve represents the outputs from DISOPRED
and the 'output' curve the outputs from a linear SVM classifier (DISOPREDsvm).
The outputs from DISOPREDsvm are included to indicate shorter, low confidence
predictions of disorder.
Asterisks (*) represent disordered predictions
and dots (.) prediction of order.
The disopred predictions are given at a default false positive rate threshold
of 2%. But this value can be changed by the user.
MEMSAT
MEMSAT
predicts the occurrence of putative TM segment in the protein.
Central TM helix segments are indicated with 'X' in the output sequence.
Information about the predicted TM topology is also provided.
Template Identification
The degree of difficulty in identifying a suitable template
for a target sequence can range from "trivial" for well-characterized
protein families to "impossible" for proteins with an unknown
fold. The SWISS-MODEL Workspace provides access to a set of increasingly
complex and computationally demanding methods to search for templates.
Templates which are close homologues of the target can be identified using
a gapped BLAST (Altschul et al.) query against
the ExPDB template library extracted from PDB.
Options for the BLAST database search are:
E-value cutoff: sets the threshold expectation value for keeping
alignments. It describes how often a given score is expected to occur
random;
Matrix: the protein substitution matrix;
SEG Filter: filters the query sequence for low-complexity
subsequences;
Descriptions: sets the number of database sequences for which to
show the one-line summary descriptions at the top of a BLAST report;
Alignments: truncates the report to the selected number of alignments;
When no suitable templates are identified, or only parts of the target
sequence are covered, two additional approaches for the sensitive detection
of distant relationships among protein families are provided:
Iterative Profile Blast: the
template library is searched with PSI-BLAST (Altschul
et al.) using an iteratively generated sequence profile based
on NR (Wheeler et al.). This method has been
initially introduced as PDB-Blast by Godzik and coworkers.
- The first run searches the NR database and derive a profile for the
query sequence. The following options are available:
Iterations: number of iteration for the NR database search and
profile (PSSM) generation;
Matrix: the protein substitution matrix;
Evalue: The E-value threshold for inclusion in PSSM. All alignments
better than this threshold are used in constructing the PSSM;
SEG Filter: filters the query sequence for low-complexity
subsequences;
- Then with this profile, the final run searches the SWISS-MODEL template
library (ExPDB). The following options are available:
Database to search: Clustered versions of ExPDB (e.g. ExPDB90,
sequences clustered to 90% of redundancy) which combine closely related
sequences into a single record;
E-value cutoff: sets the threshold expectation value for keeping
alignments. It describes how often a given score is expected to occur
random;
Matrix: the protein substitution matrix;
SEG Filter: filters the query sequence for low-complexity subsequences;
Descriptions: sets the number of database sequences for which to
show the one-line summary descriptions at the top of a BLAST report;
Alignments: truncates the report to the selected number of alignments;
HHSearch: To detect distantly related template
structures, a target sequence can be searched against a Hidden Markov Model (HMM)
based template library. Each HMM of the library is based on a multiple sequence
alignment of the template sequence built by PSI-BLAST search (against nr90 & nr70)
enriched with secondary structure assignment.
In analogy a HMM is built for the target sequence, which is subsequently used to search
against the template library. Only alignments which score more than a given P-value
cut-off are reported. Model building and library searches are performed using the
HHSEARCH (v. 1.5.01) software package (Söding et al.)
with default parameters.
For detailed documentation, please visit the official HHSEARCH
site [http://toolkit.tuebingen.mpg.de/hhpred]
Display of template identification
results
A condensed graphical view of the modeling task is provided containing
the target sequence, the template matches sorted and colored according
to the associated E-value, and the InterPro mappings. Clickable bars indicate
the matched regions and guide the user to the underlying original program
output.
In the InterPro output a link leads to the detailed
InterPro page for this entry.
In the output of the different template identification programs the template
annotations (via the link to the SWISS-MODEL Template
library) and target-template alignment can be retrieved.
Alignments can be obtained as DeepView project file. The latter allows
the user to visualize the different alignments in the structural context
of the template, to correct misplaced insertions and deletions, and to
manually adjust misaligned regions. The modified project can then be saved
to disk and submitted as "project mode" to the workspace for
model building by the SWISS-MODEL pipeline.
When searching a clustered version of the SWISS-MODEL
Template library (e.g. ExPDB90) only the alignment between the target
sequence and the sequence of the representative of the cluster is shown.
Information about the members of the cluster is presented in the detailed
output of the different template search programs. For each template, the
SWISS-MODEL workspace provides a summary showing a small ribbon representation,
experimental details, information about bound molecules, as well as links
to PDB (Westbrook et al.), SCOP
(Andreeva et al.),
CATH (Pearl et al.),
PDBsum (Laskowskiet et al.), and
MSD (Velankaret et al.).
Model building
Depending on the difficulty of the modelling task, three
different types of modelling requests (automated mode, alignment
mode, project mode) are provided, which differ in the amount
of user intervention.
Modelling requests are computed by the SWISS-MODEL server
homology modelling pipeline (Schwede et al.).
Automated Mode
The "automated mode" is suited for cases where
the target-template similarity is sufficiently high to allow for fully
automated modelling. As a rule of thumb, automated sequence alignments
are sufficiently reliable when target and template share more than 50%
percent of sequence identity.
This submission requires only the amino acid sequence
or the UniProt accession code of the target protein as input data. The
pipeline will automatically select suitable templates based on a Blast
(Altschul et al.) E-value limit (which can
be adjusted upon submission), experimental quality, bound substrate molecules,
or different conformational states of the template.
Depending on the planned model application, it can be
necessary to select a different structural template than the one ranked
first in the automated process. Typical examples are proteins in
different conformational states, e.g. 1ake vs. 4ake. It is possible
to specify the structure to be used as modelling template either by
identifying an entry in the SWISS MODEL
Template library by PDB-ID + ChainID e.g. "1ake" chain "A", or by
uploading a file in PDB format (*) with coordinates of the template
structure. Please make sure that this file contains only a single
protein chain, and does not contain chemically modified amino acids,
hereto atoms, ligands, etc.
(*) A simple PDB-like file containing the coordinates of the template
structure. For more information about PDB file format please see link:
http://www.wwpdb.org/docs.html
Alignment Mode
Multiple sequence alignments are a common tool in many
molecular biology projects. If the three-dimensional structure is known
for at least one of the members, this alignment can be used as starting
point for comparative modelling using the "alignment mode".
The "alignment mode" allows the user to test several alternative
alignments and evaluate the quality of the resulting models in order to
achieve an optimal result.
In order to facilitate the use of alignments in different
formats, the submission is implemented as a three step procedure:
1. Prepare a multiple sequence alignment.
- It must contain at least your target sequence and
the template sequence
- Use any of your favorite alignment tools. We recommend
T_COFFEE
by Cedric Notredame
- Make sure the sequence names are "reasonable"
2. Submit your alignment to the Workspace Alignment
Mode.
- Possible formats are: FASTA, MSF,
CLUSTALW, PFAM and SELEX
- You may either upload your file or cut & paste
- Don't forget to specify the correct alignment format
- Here is a small example for testing (cut & paste):
CLUSTAL W (1.82) multiple sequence alignment
THN_DENCL KSCCPTTAARNQYNICRLPGTPRPVCAALSGCKIISGTGCPPGYRH- 46
THNX_TEST KSCCPDTTGRDIYNTCRFGGGSRQVCARISGCKIISASTCPS-YPNK 46
1crnA TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN- 46
.:*** ..* : **: * .. :** :** **..: ** *
|
3. Select Target and Template
- The alignment (as it was interpreted by the server)
should now be displayed in the bottom part of the page.
- The script will try to make a good guess for the correct
names based on your submission.
- Select the sequence name of the target sequence (e.g.
THN_DENCL)
- Select the sequence of the template structure (e.g.
1crnA). You don't need to use PDB IDs, you may use any name you like.
- Specify the template structure to which this sequence
belongs. This template MUST be part of the ExPDB template library. Please
use the SWISS-MODEL Template library tool to check...
- Don't forget to specify the correct CHAIN ID. Note
that PDB's chain IDs are normally in capital letters.
4. Check Alignment and Submit
- The alignment at the bottom of the page should represent
the correct mapping of the template structure on the target sequence.
Please check carefully before submission.
- As usual, please provide name and e-mail for the SWISS-MODEL
submission.
- Good Luck with your model ....
The server pipeline will build the model purely based on
this alignment. During the modelling process, implemented as rigid fragment
assembly in the SWISS-MODEL (Schwede et al.)
pipeline, the modelling engine might introduce minor heuristic modifications
to the placement of insertions and deletions.
Supported Alignment formats
The following formats are currently supported: FASTA,
MSF, CLUSTALW, PFAM and SELEX;
Examples:
fasta:
>THN_DENCL KSCCPTTAARNQYNICRLPGTPRPVCAALSGCKIISGTGCPPGYRH- >THNX_TEST KSCCPDTTGRDIYNTCRFGGGSRQVCARISGCKIISASTCPS-YPNK >1crnA TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN-
|
clustal:
CLUSTAL W (1.82) multiple sequence alignment
THN_DENCL KSCCPTTAARNQYNICRLPGTPRPVCAALSGCKIISGTGCPPGYRH- 46
THNX_TEST KSCCPDTTGRDIYNTCRFGGGSRQVCARISGCKIISASTCPS-YPNK 46
1crnA TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN- 46
.:*** ..* : **: * .. :** :** **..: ** *
|
msf:
!!AA_MULTIPLE_ALIGNMENT 1.0
thn_dencl.msf MSF: 47 Type: P 08/08/05 CompCheck: 427 ..
Name: THN_DENCL Len: 47 Check: 8212 Weight: 1.00
Name: THNX_TEST Len: 47 Check: 5295 Weight: 1.00
Name: 1crnA Len: 47 Check: 6920 Weight: 1.00
//
1 47
THN_DENCL KSCCPTTAARNQYNICRLPGTPRPVCAALSGCKIISGTGCPPGYRH~
THNX_TEST KSCCPDTTGRDIYNTCRFGGGSRQVCARISGCKIISASTCPS.YPNK
1crnA TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN~
|
Project Mode
In difficult modeling situations, where the correct alignment between
target and template cannot be clearly determined by sequence based methods,
visual inspection and manual manipulation of the alignment can significantly
help improving the quality of the resulting model.
Project files contain the superposed template structures, and the alignment
between the target and template. Project files can be generated inside
the program DeepView (Swiss-PdbViewer Guex et al.),
by the workspace template selection tools, and are also the default output
format of the modeling pipeline. This allows analyzing and iteratively
improving the the models generated by the "Automated mode" and
"Alignment mode" modeling approaches.
The program DeepView can be downloaded freely from the tools section or
from the ExPASy
web site .
DeepView
The program DeepView (Swiss-PdbViewer, Guex
et al.) can be used to generate, display, analyze and manipulate
modeling project files for the SWISS-MODEL workspace.
Project files contain the superposed template structures, and the alignment
between the target and template. The user has therfor full control over
essential modelling parameters, i.e. the choice of template structures,
the correct alignment of residues, and the placement of insertions and
deletions in the context of the three-dimensional structure.
Project files can be generated inside DeepView, by the workspace template
identification tools, and are also the default output format of the modeling
pipeline. This allows analyzing and iteratively improving the output of
the different modeling tools.
DeepView allows to visualize the model and the templates, and to analyse
certain structural features e.g. Ramachandran plots or electrostatic properties.
Moreover, it allows adjusting manually the placement of insertions and
deletions in the alignment on which the initial modelling process was
based on. The project with the modified alignment can then be re-submitted
to the SWISS-MODEL workspace for model building.
DeepView can be downloaded at:
http://www.expasy.org/spdbv/
DeepView does not require administrator privileges for
installation. E.g. under MS windows, simply uncompress the distributed
archive at any location you like (e.g. c:\spdbv or on your desktop) and
start working by starting the spdbv.exe application.
Input target sequence and UniProt AC code
The amino acid sequence of a protein to be modeled or analyzed can be
submitted in FASTA or raw format. If the protein sequence is deposited in
the UniProt (Bairoch
et al.)knowledgebase, the AC (ACcession number) for the entry can
be also specified.
Examples:
- raw format: the amino acids sequence of the protein in plain-text:
MVEIVYWSGTGNTEAMANEIEAAVKAAGADVESVRFEDTNVDDVASKDVILLGCPAMGSE ELEDSVVEPFFTDLAPKLKGKKVGLFGSYGWGSGEWMDAWKQRTEDTGATVIGTAIVNEM PDNAPECKELGEAAAKA
|
- FASTA format consists of a single-line description, followed by lines
of sequence data. The first character of the description line is a greater-than
(">") symbol:
>sp|P00321|FLAV_MEGEL Flavodoxin - Megasphaera elsdenii. MVEIVYWSGTGNTEAMANEIEAAVKAAGADVESVRFEDTNVDDVASKDVILLGCPAMGSE ELEDSVVEPFFTDLAPKLKGKKVGLFGSYGWGSGEWMDAWKQRTEDTGATVIGTAIVNEM PDNAPECKELGEAAAKA
|
- UniProt Accesion number: P00321
Display of modeling results
Coordinates of the model, the underlying alignment, log
files, and quality evaluations can be accessed and downloaded via web-browser
from the workspace.
Model Details
This section gives access to display the model and download its coordinates.
The model coordinates are available in two different formats:
- DeepView project files (recommended).
- PDB format
PDB formatted protein models can be displayed by any
molecular visualization tool or browser-plugin. Here is a short list of
freely available software:
- DeepView (MS Windows, Macintosh, Linux)
- DINO (Linux, IRIX, OSF,SUN)
- Rasmol (MS Windows, Mac, Unix)
- CHIME Plugin (requires registration)
If the model has been build using the Automated Mode,
information about the template(s) used for modeling is provided with cross
references to structural information databases via the link to the SWISS
MODEL Template library.
Alignment Output
Displays the target template sequence alignment used in the modeling procedure
and the assigned secondary structure.
Modeling Log
The modeling log gives a detailed description of the individual modeling
steps. The models are built using the SWISS-MODEL server pipeline
(Schwede et al.). The modelling log shows
the individual steps during model building Guex
et al., especially which parts of the
model have been built ab initio (i.e. insertions / deletions).
Template SelectionLog
The logfile provides information about the template selection
step to search the SWISS-MODEL Template library for
suitable templates.
Protein Structure & Model
Assessment Tools
Evaluation of model quality is a crucial step in homology modeling.
While the performance of the automated SWISS-MODEL (Schwede
et al.) pipeline in general is continuously evaluated by the EVA
project (Koh et al.), the quality of individual
models can vary significantly.
Therefore, graphical plots of Anolea mean force potential (Melo
et al.) and GROMOS empirical force field energy (van
Gunsteren et al.)
are provided to enable the user to estimate the local quality of the predicted structure.
The stereochemistry of protein models and template structures can be analysed with
Whatcheck (Hooft et al.) and Procheck (
Laskowski et al.). In order to be able to rank alternative models
of the same target, pseudo energies for the entire model as calculated by
QMEAN (Benkert et al.) and DFIRE (Zhou
et al.) are provided as well. To facilitate the description of template and
model structures, DSSP (Kabsch et al.) and Promotif
(Hutchinson et al.) can be invoked to classify
structural features.
Anolea
The atomic empirical mean force potential
ANOLEA
(Melo et al.) is used to assess packing quality
of the models. The program performs energy calculations on a protein chain,
evaluating the "Non- Local Environment" (NLE) of each heavy
atom in the molecule.
The y-axis of the plot represents the energy for each amino acid of the
protein chain. Negative energy values (in green) represent favourable
energy environment whereas positive values (in red) unfavourable energy
environment for a given amino acid.
QMEAN
QMEAN
(Benkert et al.)
is a composite scoring function for the estimation of the global model
quality. QMEAN consisting of five structural descriptors: The local geometry
is analysed by a torsion angle potential over three consecutive amino acids.
A distance-dependent pairwise residue-level potential is used to assess
long-range interactions. A solvation potential describes the burial status of
the residues. Two simple terms describing the agreement of predicted and
calculated secondary structure and solvent accessibility, respectively, are
also included.
QMEAN returns a pseudo energy of the whole model which can be used in order to
compare and rank alternative models of the same target. The lower the
predicted energy, the better model. Additionally, the pseudo energies of the
five contributing terms are provided.
DFire
DFIRE
(Zhou et al.)
is an all-atom statistical potential based on a distance-scaled finite
ideal-gas reference state. DFIRE is used to assess non-bonded atomic
interactions in the protein model.
A pseudo energy for the entire model is provided which reflects the quality of
the model and can be used for ranking alternative predictions of the same
target. A lower energy indicates that a model is closer to the native
conformation.
Gromos
GROMOS (van
Gunsteren et al.) is a general-purpose molecular dynamics computer
simulation package for the study of biomolecular systems and can be applied
to the analysis of conformations obtained by experiment or by computer
simulation.
The y-axis of the plot represents the energy for each amino acid of the
protein chain. Negative energy values (in green) represent favourable
energy environment whereas positive values (in red) unfavourable energy
environment for a given amino acid.
What Check
What
Check comprises several tools for protein structure verification (Hooft
et al.).
Procheck
The PROCHECK
suite of programs (Laskowski et al.) assess
the "stereochemical quality" of a given protein structure.
The aim of PROCHECK is to assess how normal, or conversely how unusual,
the geometry of the residues in a given protein structure is, as compared
with stereochemical parameters derived from well-refined, high-resolution
structures.
PROMOTIF
PROMOTIF
(Hutchinson et al.) automatically identifies,
classify and analyse a number of supersecondary structural motifs in proteins.
Any resulting patterns will be useful in prediction of protein structure
from amino acid sequence. Motifs analyzed include beta turns, gamma turns,
Greek keys, beta hairpins and beta bulges. Data from PROMOTIF analyses
are included in the PDBsum
(Laskowskiet et al.) web site, which provides
information derived from all currently available protein coordinate files.
DSSP
The DSSP
(Kabsch et al.) program defines secondary structure,
geometrical features and solvent exposure of proteins, given atomic coordinates
in Protein Data Bank format. The program does NOT PREDICT protein structure.
The DSSP code
H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend
SwissModel Template Library (ExPDB)
The template structure database used by SWISS-MODEL
(SMTL or ExPDB library) is derived from the Protein Data Bank (Westbrook
et al.). In order to allow sequence-based template searches, each
PDB entry is split into individual chains. The separated template chains
are annotated with information about experimental method, resolution (if
applicable), ANOLEA mean force potential (Melo et
al.), Gromos96 energy (van Gunsteren et al.)
and PQS (Henrick et al.) quaternary state assignment
to allow for rapid retrieval of the relevant structural information during
template selection. Theoretical models, structures only consisting of
C alpha atoms and irregularly formatted database entries are removed.
In order to speed up the sequence database search step of the template
identification algorithms and to provide a clear and concise overview
of the results, templates sharing 100% sequence identity are grouped into
a SMTL100 library using the program CD-HIT, a fast clustering method for
sequences at high identity thresholds (Li et al.).
Clusters of sequences having 90%, 70% and 50% sequence identity are derived
from the RCSB non-redundant PDB lists.
The ExPDB codes are constructed according to the following rule: PDBCODE+ChainID
Examples:
- Light harvesting protein: 1cpc contains two chains (with IDs
A & B).
The corresponding ExPDB entries are respectively:
- Chain A: 1cpcA
- Chain B: 1cpcB
User specified template:
Depending on the planned model application, it can be
necessary to select a different structural template than the one ranked
first in the automated process. Typical examples are proteins in
different conformational states, e.g. 1ake vs. 4ake. It is possible to
specify the structure to be used as modelling template either by
identifying an entry in the SWISS-MODEL template library
by PDB-ID + ChainID e.g. "1ake" chain "A", or by uploading a file in
PDB format (*) with coordinates of the template structure. Please make sure
that this file contains only a single protein chain, and does not
contain chemically modified amino acids, hereto atoms, ligands, etc.
(*) A simple PDB-like file containing the coordinates of the template
structure. For more information about PDB file format please see link:
http://www.wwpdb.org/docs.html