Introduction to SWISS-MODEL
SWISS-MODEL is a web-based integrated service dedicated to protein structure homology modelling. It guides the user in building protein homology models at different levels of complexity.
Building a homology model comprises four main steps: (i) identification of structural template(s), (ii) alignment of target sequence and template structure(s), (iii) model-building, and (iv) model quality evaluation. These steps require specialised software and integrate up-to-date protein sequence and structure databases. Each of the above steps can be repeated interactively until a satisfying modelling result is achieved.
The SWISS-MODEL Workspace
The SWISS-MODEL Workspace (Waterhouse et al.) is a personal web-based working environment, where several modelling projects can be carried out in parallel. Protein sequence and structure databases necessary for modelling are accessible from the workspace and are updated in regular intervals. Tools for template selection, model building, and structure quality evaluation can be invoked from within the workspace directly or via the web page menu.
From the workspace, the user accesses the current modelling projects, inspects their status and visualises the results upon job completion. Project names can be changed retroactively by clicking on the symbol next to the project title. Alternatively, the project title can also be changed by double-clicking on the title when the project results are displayed. By default, projects are stored for two weeks on the server with an option to extend the project lifetime. The remaining time until a given project is deleted from the server is displayed accordingly.
If you have built a model which you would like to maintain indefinitely and the model will be cited in a journal, you may consider depositing your model at the ModelArchive where it will receive a DOI once the journal citation is available.
Model Building
Models are computed by the SWISS-MODEL server homology modelling pipeline (Waterhouse et al.) which relies on ProMod3 (Studer et al.), an in-house comparative modelling engine based on OpenStructure (Biasini et al.).
ProMod3 extracts initial structural information from the template structure. Insertions and deletions, as defined by the sequence alignment, are resolved by first searching for viable candidates in a structural database. Final candidates are then selected using statistical potentials of mean force scoring methods. If no candidates can be found, a conformational space search is performed using Monte Carlo techniques. Non-conserved side chains are modelled using an in-house backbone-dependent rotamer library. The optimal configuration of rotamers is estimated using the graph-based TreePack algorithm (Xu et al.) by minimising the SCWRL4 energy function (Krivov et al.). As a final step, small structural distortions, unfavourable interactions or clashes introduced during the modelling process are resolved by energy minimisation. ProMod3 uses the OpenMM library (Eastman et al.) to perform the computations and the CHARMM22/CMAP force field (Mackerell et al.) for parameterisation.
Modelling Modes
Depending on the difficulty of the modelling task, three different types of modelling modes are provided, which differ in the amount of user intervention: automated mode, alignment mode, and project mode.
Automated Mode
The Automated Mode only requires the amino acid sequence or the UniProtKB accession code of the target protein as input.
The automatic pipeline identifies suitable templates based on BLAST (Camacho et al.), and HHblits (Steinegger et al.). The resulting templates are ranked according to the expected quality of the resulting models (see the Template Ranking section for more details). Top-ranked templates and alignments are compared to verify whether they represent alternative conformational states or cover different regions of the target protein. In this case, multiple templates are selected automatically and different models are built accordingly.
This mode is subject to continuous evaluation within the Continuous Automated Model Evaluation (CAMEO) platform (Haas et al.).
Please note that it is unnecessary to run automated mode by pressing "Build Model" and afterwards start the project again and "Search for Templates" only. Both options start the same template search, which is also accessible in the first case, once the models are built.
Alignment Mode
If the desired template for modelling is known and available in the SWISS-MODEL Template Library (SMTL), a target–template alignment in either FASTA or Clustal format may be used to start the modelling process, thereby skipping the template search.
The template sequence(s) should be named using the PDB ID format (i.e. “1CNR” or “1CNR_A”). The user will be asked to specify which sequence in the alignment corresponds to the target and/or the template protein from a drop-down list.
The Alignment mode allows the advanced user to invoke the modelling step starting from alternative alignments and to evaluate the quality of these alternative models.
>THN_DENCL
KSCCPTTAARNQYNICRLPGTPRPVCAALSGCKIISGTGCPPGYRH-
>1crnA
TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN-
It is possible to edit your alignment further in the input window,by clicking on the edit icon to the left of the validated input alignment. This will start edit mode, a cursor will appear in first row of the alignment. Use the arrow keys to move the cursor, then press spacebar to insert a gap and del key to delete a character. The sequence identity of the new alignment is displayed, and non-identical residues in a column will be faded to light gray. Use control-z to undo any editing, or just click the reset button to go back to the start. Click the edit icon to exit alignment editing mode.
Project Mode
In difficult modelling situations, where the correct alignment between target and template cannot be clearly determined by sequence-based methods, visual inspection and manual manipulation of the alignment can help improving the quality of the resulting model significantly.
The program DeepView - Swiss-PdbViewer (Guex et al.) can be used to generate, display, analyse, and manipulate modelling project files in the SWISS-MODEL workspace. Project files contain the superposed template structures and the alignment between the target and the template. In this mode, the user has full control over essential modelling parameters, i.e. the choice of template structures, the correct alignment, and the placement of insertions and deletions in the context of the 3D structure. Project files can also be generated by the workspace template selection tools.
Download the program from the DeepView website. SWISS-MODEL supports DeepView legacy projects by relying on the previous version of the PROMOD modelling pipeline.
Ligand Modelling
Biologically relevant ligands and cofactors are modelled using a conservative homology transfer approach from the templates identified in the SMTL. Ligands in the SMTL are annotated either as: (i) relevant, non-covalently bound ligand, (ii) covalent modifications, or (iii) non-functional binders (e.g. buffer or solvent). A non-covalently bound ligand is considered for the model if it has at least three coordinating residues in the protein and those residues are conserved in the target–template alignment. The relative coordinates of the ligand(s) are transferred from the template, if the resulting atomic interactions in the model are within the expected range for van der Waals interactions and water-mediated contacts.
In PDB format, the ligands are all stored in a separate chain named '_' with different residue numbers distinguishing different ligands.
Protein-ligand interactions
When ligands are present in the model, non-covalent protein-ligand interactions are annotated with PLIP (Salentin et al.). Seven types of interactions are covered: hydrogen bonds, hydrophobic contacts, pi-stacking, pi-cation interactions, salt bridges, water bridges and halogen bonds.
Oligomeric Modelling
In SWISS-MODEL, the quaternary structure annotation of the template is used to model the target sequence in its oligomeric form. The method (Bertoni et al.) is based on a supervised machine learning algorithm, Support Vector Machines (SVM), which combines interface conservation, structural clustering, and other template features to provide a quaternary structure quality estimate (QSQE). The QSQE score is a number between 0 and 1, reflecting the expected accuracy of the interchain contacts for a model built based a given alignment and template. In general a higher QSQE is "better", while a value above 0.7 can be considered reliable to follow the predicted quaternary structure in the modelling process. This complements the GMQE score which estimates the accuracy of the tertiary structure of the resulting model. QSQE is only computed if it is possible to build an oligomer and only for the top ranked templates.
The SWISS-MODEL Template Library (SMTL)
The SWISS-MODEL template library is a large structural database of experimentally determined protein structures derived from the Protein Data Bank (Berman et al).
It serves as the main repository of structural information for the modelling pipeline and provides atomic coordinates of protein structures as well as maintains sequence and profile databases which can be searched by BLAST and HHblits. Alignment-independent properties of the templates are precalculated and stored in the database, e.g. a mapping between residues resolved in the experiment and corresponding residues in the full protein sequence, predicted solvent accessibility and secondary structure information.
Individual entries of the SMTL can be inspected using the web interface. The sequence features are linked to a 3D structure viewer and can be interactively explored. SMTL IDs consist of the PDB ID, an integer representing the biounit and a capital letter for the chain ID. The SMTL chain ID is not necessary, the same as the PDB chain ID. The mapping is shown in "SMTL:PDB".
Ligands can be marked as synthetic, natural or part of crystallisation buffer. This information is used by the modelling pipeline to determine whether a ligand is considered for inclusion into the final model.
Biological Assemblies (Biounit) of Templates
The biological assembly (biounit) describes the oligomeric state, or quaternary assembly, which is thought of as the biologically most relevant form of the molecule. For a detailed description see Biological Assemblies on PDB-101.
The biological assembly reported in the SMTL is retrieved from the PDB entry.
SMTL entries are organised (if more than one assembly is available) by likely quaternary structure assemblies which are created according to the author and software-annotated oligomeric states listed in the PDB deposition. If not all chains of the asymmetric unit are included by any biounit of a PDB entry, the asymmetric unit is included as a template.
Input Data
Protein amino acid sequence or UniProtKB identifier
The amino acid sequence of the target protein can be submitted either as plain text, or in FASTA format.
Example of plain text sequence:
MVEIVYWSGTGNTEAMANEIEAAVKAAGADVESVRFEDTNVDDVASKDVILLGCPAMGSE
ELEDSVVEPFFTDLAPKLKGKKVGLFGSYGWGSGEWMDAWKQRTEDTGATVIGTAIVNEM
PDNAPECKELGEAAAKA
Example of FASTA sequence:
>sp|P00321|FLAV_MEGEL Flavodoxin - Megasphaera elsdenii.
MVEIVYWSGTGNTEAMANEIEAAVKAAGADVESVRFEDTNVDDVASKDVILLGCPAMGSE
ELEDSVVEPFFTDLAPKLKGKKVGLFGSYGWGSGEWMDAWKQRTEDTGATVIGTAIVNEM
PDNAPECKELGEAAAKA
If the protein sequence is deposited in the UniProtKB (The UniProt Consortium) database, the UniProtKB identifier of the entry can be provided as input (i.e. P00321). In this case, the identifier is immediately validated and replaced with the corresponding sequence.
The "Add Hetero Target" button is provided to input multiple target sequences representing different subunits of a hetero-oligomer. The target sequences must be unique and can be submitted as plain text, FASTA sequences, or UniProtKB ACs. If a hetero-oligomer is requested, we only look for biounits of templates that contain connected chains with all desired subunits.
Target–template alignment
The following formats are currently supported: FASTA and Clustal.
Example for FASTA:
>THN_DENCL
KSCCPTTAARNQYNICRLPGTPRPVCAALSGCKIISGTGCPPGYRH-
>1crnA
TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN-
Example for Clustal:
CLUSTAL W (1.82) multiple sequence alignment
THN_DENCL KSCCPTTAARNQYNICRLPGTPRPVCAALSGCKIISGTGCPPGYRH- 46
1crnA TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN- 46
.:*** ..* : **: * .. :** :** **..: ** *
User Template
If the user knows the structure of the template to use for modelling, the coordinates can be uploaded in PDB format(*) together with the target protein sequence. Oligomeric templates are accepted, and it is also possible to build heteromers by adding multiple target sequences to the input. To start a modelling job with your own template:
- Press the "User Template" button
- Enter the target sequence as normal.
- Optional : to start a hetero project, you can now click "Add Hetero Target" to add another target sequence
- Click the "Add Template File..." button
- Click "Build Model"
Important: Make sure that there are no chemically modified amino acids!
If the file is not accepted, you may first try removing non-standard residues (HETATMS).
(*) A PDB-like file containing the coordinates of the template structure. For more information about PDB file format please see this link.
Please notice that the mmCIF format is currently not supported.
DeepView Project
Project files containing the superposed template structures, and the alignment between the target and template can be directly uploaded into the SWISS-MODEL Workspace. See the “Project Mode” section for further details. An example of DeepView Project and its application in modelling of Oligomeric proteins can be found here.
Template search
The degree of difficulty in identifying a suitable template for a target sequence can range from "trivial" for well-characterised protein families to "impossible" for proteins with an unknown fold. The SWISS-MODEL server provides access to a set of increasingly sophisticated methods to search for templates.
The SWISS-MODEL Template Library is searched in parallel both with BLAST (Camacho et al.) and HHblits ( Steinegger et al.) to identify templates and to obtain target–template alignments. The combined usage of these two approaches guarantees good alignments at high and low sequence identity levels. In addition to the PDB-based SMTL, SWISS-MODEL also searches the AlphaFold DB (Varadi et al.) for templates with high sequence identity (≥70%).
By using the “Template Search” option, templates are searched in the SMTL using BLAST and HHblits. For the latter we build a profile for the target sequence as outlined in (Steinegger et al.) using 1 iteration of HHblits against Uniclust30 (Mirdita et al.) and use it to search all profiles of the SMTL. The AlphaFold DB is searched by SWISS-MODEL using a k-mer search algorithm (k=5), heavily inspired by (Goddard). Results for the AlphaFold DB search are currently limited to 1 template. The templates found are listed together with relevant structural information that can be readily used to select templates according to user-defined criteria.
Ranking of template results
When the template search is complete, templates and alignments are first filtered to remove redundancy. A set of maximally 50 top-ranked templates is then chosen from the full list of templates according to a simple score which combines sequence coverage and sequence similarity. The top-ranked templates and alignments are further analysed and sorted according to the expected quality of the resulting models, as estimated by GMQE and, if the target model is predicted to be an oligomer, QSQE. In detail, the default template ranking is according to the descending lexicographic order of (is_full_biounit, bin, gmqe + qs_value), where: is_full_biounit is only used for heteromers and is set to 1, if all chains from the template biounit are included for modelling, or 0 otherwise; bin is computed as ceil((gmqe - max_gmqe) / 0.1), where max_gmqe is the best gmqe observed in the templates; gmqe is the GMQE of the template; qs_value is set to QSQE of the template, if the target model is predicted to be an oligomer, or 0 otherwise.
Display of template identification results
The Template Results page serves both as an overview of available templates as well as an interactive template selection tool. The top part of the screen contains a summary of the top-ranking templates identified by the template search methods. The identified templates and the default template ranking correspond to the ones used in the Automated Mode. Please note that in the Automated Mode, additional templates, apart from the top-ranked one, may be chosen for modelling if they represent alternative conformational states or cover different regions of the target protein.
Four types of views can be available (based on the data input): (i) a Templates summary table, listing all templates in tabular form and providing an overview of relevant attributes of each template, (ii) the Quaternary Structure, (iii) an interactive chart showing the templates in relation to each other in Sequence Similarity space, and (iv) the sequence Alignment of Selected Templates.
Templates can be selected in any of these views for the subsequent modelling step. Selected templates are automatically shown in the 3D viewer. If multiple templates are selected, their structural superposition is shown, allowing instant visualisation of structural differences between them.
The complete list of all identified templates can be accessed at the bottom of the Template Results page.
In the Templates, a summary table, template annotations, and target–template alignments can be retrieved by clicking on the arrows at the left end of the table rows to expand the box with the description of the individual templates.
For each SMTL template, the following information is provided: the SMTL ID, the title of the structure, the target sequence coverage, GMQE, QSQE, the sequence identity to the target, the experimental method used to obtain the structure (and the resolution, if applicable), the oligomeric state, the ligands (if any), the sequence similarity to the target, and the template search method used. Templates originating from AlphaFold DB carry tailored data in difference to SMTL templates: The ID is the AlphaFold DB ID (UniProtKB AC) with the fragment number and the chain name. Instead of the title of the PDB structure, the UniProtKB protein name is used, along with the entry name, gene and organism from UniProtKB annotation. The method field in the expanded view is set to AlphaFold v2, the tool used to produce models in AlphaFold DB. The template search method is marked as AFDB search. In case template search hits an AlphaFold DB entry that is obsolete in UniProtKB, the template description changes to UniProtKB entry unknown, maybe obsolete and no gene name and no organism can be retrieved. This can happen since AlphaFold DB is not always up-to-date with UniProtKB, but such model coordinates are still valid as templates for homology modelling as the quality of the model does not suffer for an obsoleted UniProtKB entry.
For each template, the oligomeric state of the model is predicted. If the predicted oligomeric state of the model differs from the one of the template biounit or not all chains from the biounit are included, a warning symbol is shown (exclamation mark in a triangle). Whenever possible, the user can choose the oligomeric state manually by expanding the template view under the point "Target Prediction".
Several methods are currently used to determine the structure of a protein. In homology modelling, it is generally preferable to use structures determined by X-ray crystallography with high resolution as templates. We generally discourage the use of averaged NMR structures. In individual cases, taking into account the ensemble of structures determined by NMR spectroscopy, might provide useful insights. Special care is required when using structures determined by electron microscopy as they range from low resolution "blobology" to structures at atomic resolution. The only non-experimentally determined templates available in SWISS-MODEL are derived from AlphaFold DB (method: AlphaFold v2, found by: AFDB search), which usually show quality high enough to be used in homology modelling.
Target–template sequence similarity is calculated from a normalised BLOSUM62 (Henikoff et al.) substitution matrix (i.e. the largest and smallest values in the BLOSUM62 are 1 and 0, respectively). The sequence similarity of the alignment is calculated as the sum of the substitution scores divided by the number of aligned residue pairs. Gaps are not taken into account.
The Quaternary Structure view provides information on the quaternary structure analysis. Templates are clustered and displayed in a decision tree according to their oligomeric state, stoichiometry, topology and interface similarity. On the level of the oligomeric state, the templates are grouped in either monomeric, homomeric or heteromeric clusters. Stoichiometry considers only the number of chains in the structure while on the topology level the templates are grouped according to the interactions between the subunits. The interface similarity quantifies the similarity between interfaces as a function of shared interfacial contacts between the chains and thus allows to distinguish between different quaternary structures and binding modes. Each leaf of the tree corresponds to a template labelled with the PDB code and a bar indicating sequence identity to the target and coverage. Templates from the AlphaFold DB are excluded from the Quaternary Structure view as all UniProtKB entries are modelled by AlphaFold DB as monomers without consideration of the oligomeric state of the represented protein.
Protein–protein interaction (PPI) Fingerprint curves inform about the conservation of template interfaces. Residues participating in interfaces are subject to different evolutionary constraints than residues at the protein surface, e.g. interacting with the solvent. A value of interface conservation (y-axis) below 0 indicates that interface residues are less prone to mutate when compared to surface residues. An estimate of conservation is typically derived from a multiple sequence alignment (MSA) of homologous proteins. The alignment is sliced using different sequence identity cut-offs (x-axis) to filter the MSA of the target protein (e.g. with a sequence identity cut-off at 50% only sequences with > 50% sequence identity to the target are retained). In this way it can be observed how the various template interfaces "adapt" to the target protein family. Considering the full set of homologues, the alternative quaternary structure can have similar interface conservation, making the selection of template harder. Considering closer homologues, the PPI fingerprints of the various templates will diverge, allowing an easier selection, as better-adapted interfaces will reach lower values of interface conservation.
In the Sequence Similarity chart each template is shown as a circle. The distances between the templates in the plot is proportional to the sequence identity between them. Thus, similar sequences cluster together.
In the Alignment of Selected Templates view the alignments of the selected templates to the target are visualised.
DeepView project files can be accessed from the drop-down menu, using the 'More' button. This allows the user to visualise different alignments in the structural context of the template, helping to correct misplaced insertions and deletions, and manually adjust misaligned regions. The modified project can then be saved to disk and submitted as "Project mode" to the workspace for model building by the SWISS-MODEL pipeline.
Display of modelling results
Coordinates of the model, the corresponding alignment and quality evaluations can be accessed and downloaded via web browser from the workspace.
Model details
This section allows to display the 3D structure of models and their target–template sequence alignment as well as to download the model coordinates. For better assistance, many sequence features/scoring schemes are synchronised with the 3D molecular view.
The colouring of the alignment can be changed by clicking on the "Options" button (cog icon) and selecting the desired colouring scheme.
Model coordinates are available in two different formats:
- DeepView project files
- PDB format
If the model has been build using the Automated Mode, the information about the selected template(s) is provided with cross-references to structural databases via the link to the SWISS MODEL Template library.
By default, the final model is presented in colours based on the QMEAN model quality. This allows instant visualisation of regions of the model that are well or poorly modelled. Information about the oligomeric state, as well as bound ligands and cofactors are provided. The user can alternatively choose to see the results in a well formatted report page which shows all the results in a readable format that can be copied and pasted to other documents. The user can download an archive file containing all the models and reports for the given target sequence.
There are very rare cases where modelling fails because the template structure does not contain enough backbone atoms in the aligned region (we need at least N-CA-C to be available and we skip d-peptides). In such a case, we do not return any model structure.
Currently, there are three very rare cases where major modelling issues appear. These issues are displayed with a prominent warning sign and a potentially sub-optimal model is displayed. The models may locally still be valid and useful. We suggest to carefully look at the local QMEAN scores to judge the model. The three issues are:
- If the target–template alignment contains very large deletions mixed with small aligned patches, we may return an incomplete model as we are unable to cleanly resolve the deletion. Apart from the unresolved deletion, these could still be high quality models.
- A "ring punch" is defined by a bond passing through the carbon ring of another amino acid (His, Pro, Phe, Tyr, Trp). This is an unfortunate and unpredictable effect of the final energy minimisation in the modelling process. Apart from the residues involved in the "ring punch", these could still be high quality models.
- If a very bad template structure is used (so far we have only seen this with user-uploaded structures), the energy minimization may fail. This is usually caused by coordinates of different atoms occupying almost the same position. In such a case, we return the model without any energy minimization applied on it. Unless the failure was caused by a local problem in the template structure, this is expected to lead to very low quality models.
Future versions of ProMod3 may resolve the issues above.
Model evaluation
Global model evaluation
GMQE and QMEANDisCo global give an overall model quality measurement between 0 and 1, with higher numbers indicating higher expected quality. GMQE is coverage dependent, i.e. a model covering only half of the target sequence is unlikely to get a score above 0.5. QMEANDisCo on the other hand evaluates the model 'as is' without explicit coverage dependency.
GMQE (Global Model Quality Estimate) is a quality estimate which combines properties from the target-template alignment and the template structure. They are combined using a multilayer perceptron trained to predict the lDDT score of the resulting model. The GMQE is available before building an actual model and thus helpful in selecting optimal templates for the modelling problem at hand. Once a model is built, the GMQE ((1) in the figure above) gets updated for this specific case by also taking into account the QMEANDisCo global score of the obtained model in order to increase reliability of the quality estimation. If a template structure originates from AlphaFold DB, GMQE is a heuristic that sums per-residue plDDT values of aligned template residues and normalizes by target sequence length. Once a model is built, the GMQE is updated and represents the summed per-residue quality estimates normalized by target sequence length. Again, per-residue quality estimates are estimated with an AlphaFold DB specific heuristic.
QMEANDisCo global score (Studer et al., (2) in the figure above) is the average per-residue QMEANDisCo score (see below) which has been found to correlate well with the lDDT score (Mariani et al.). The provided error estimate is based on QMEANDisCo global scores estimated for a large set of models and represents the root mean squared difference (i.e. standard deviation) between QMEANDisCo global score and lDDT (the ground truth). As the reliability of the prediction depends on model size, the provided error estimate is calculated based on models of similar size to the input. The QMEANDisCo global score is not computed for models that use AlphaFold DB templates.
QMEAN Z-score analysis (Benkert et al.) is deprecated and the GMQE and QMEANDisCo global scores should be consulted for global model quality estimates instead. It is based on 4 statistical potentials of mean force and their linear combination: the "QMEAN" score. All scores, 5 in total, are compared with what one would expect from experimentally determined structures of similar size using Z-scores ((4) in the figure above). In other words: How many standard deviations from the mean is my model score given a score distribution from a large set of experimentally determined structures. Z-scores around 0.0 therefore reflect a "native-like" structure and, as a rule of thumb, a "QMEAN" Z-score below -4.0 indicates a model with low quality. This is illustrated by the "Comparison" plot ((5) in the figure above). The x-axis shows protein length (number of residues). The y-axis is the "QMEAN" score. Every dot represents one experimental protein structure. Black dots are experimental structures with a "QMEAN" score within 1 standard devation of the mean (|Z-score| between 0 and 1), experimental structures with a |Z-score| between 1 and 2 are grey. Experimental structure that are even further from the mean are light grey. The actual model is represented as a red star. The QMEAN Z-score analysis is not computed for models that use AlphaFold DB templates.
Local model evaluation
Per residue scores are estimated with the QMEANDisCo scoring function (Studer et al.). QMEANDisCo is a composite score for single model quality estimation. It employs single model scores suitable for assessing individual models, extended with a consensus component by additionally leveraging information from experimentally determined protein structures that are homologous to the model being assessed. The "Local Quality" plot ((3) in the figure above) shows, for each residue of the model (reported on the x-axis), the expected similarity to the native structure (y-axis). Typically, residues showing a score below 0.6 are expected to be of low quality. Different model chains are shown in different colours. If the model is downloaded, the local score is reported in the B-factor column of the PDB file. The local quality can also be visualised by choosing the colour scheme "Confidence". If the model is built using an AlphaFold DB template, per-residue scores are transferred plDDT values from the underlying template. The ProMod3 modelling engine resolves insertions/deletions by remodelling stretches that may be longer than the ones defined in the alignment. The assigned local qualities in these stretches linearly decrease from the anchors as a function of distance.
Modelling report
The SWISS-MODEL Homology Modelling Report offers a summary of all Models built in the project.
Note: The report is accessible (i) per model via a drop-down menu, next to the model in the Models view or (ii) for all models in report.html in the downloaded file when choosing to download the project by pressing the download button below the project title.
It is structured in the following sections:
- Model building Report: Contains project name, project date and references. The target sequence is in Table T1 of the Report.
- Results: Version of the SWISS-MODEL template library and PDB release. All identified templates are listed in Table T2.
- Models: Models are listed sequentially with each entry showing a picture of
the model, a link to the PDB file, the version of the modelling engine,
the oligomeric state, the ligands (if any), the global model quality
estimate, and the QMEAN score.
A graphical representation of the QMEAN score and its four terms separately, the local quality estimate plot, and the comparison with non-redundant set of PDB structures are also provided. For the template, a link to the template itself is provided together with the following information: the title of the structure, the target sequence coverage, the sequence identity to the target, the experimental method used to obtain the structure (and the resolution, if applicable), the oligomeric state, the ligands (if any), the sequence similarity to the target, the template search method used. - Save Project Locally: Allows to download the project as a zip file.
The main folder contains the Model report (report.html), images folder (banner for the Report), and the model folder. Each model has its own subfolder.
Model Annotations
To add annotations, click the small pen icon below the 3D view of the model to open the input textarea, in which you can freely paste or type and amend your input.
To get started with a new annotation you can also click directly on a residue in the 3D view, or select a region of residues in the target-template alignments.
MOTIF=EHFG[DL]+ST | Find a sequence of residues to annotate. Regular expressions are allowed. |
MOTIF=(EHFG)(.+)(ST) | Tip: to identify a region defined by flanking residues describe the leading, target and trailing residues using round brackets. The second (middle) group will be the annotated region. |
TARGET=1 | For heteromeric protein models, specify the target sequence. (1 based index) |
CHAIN=B | Be aware that chain names depend on the template the model was based on. For heteromeric models, check the chain names per target in the expanded model-template alignment. |
START=35 | Starting residue number, 1 based. Can be combined with END and MOTIF to define the annotated region. |
END=50 | Ending residue number, 1 based. Can be combined with END and MOTIF to define the annotated region. |
COLOR=olive COLOR=#808000 COLOR=rgb(128,128,0) | Accepted colour format is common name, hex code or rgb. If the input colour cannot be parsed, the colour will be 'white'. |
LABEL=default | Default label is residue name, chain name, residue number eg "ALA C45". Every residue in the region will have a label. |
LABEL="this is the second helical region" | For non-default labels with spaces, double quotes must be used. The label will appear on the central residue in the region annotated |
LABEL_COLOR=olive LABEL_COLOR=#808000 LABEL_COLOR=rgb(128,128,0) | Colour of the label |
LABEL_SCALE=1.5 | Scale the label against the default size for the viewer. |
SIDECHAINS=on | Sidechains will be shown in ball and stick representation. |
If you are logged in as the owner of the project, the annotations will be saved to the modelling project. If you are not the owner of the project, you are still free to edit and view annotations in your own browser window.
Colour Schemes
Score Schemes
SOA (Solvent Accessibility) | Low SOA -> High SOA | |
B-factor | <10< <15< <20< <25< <30< <35< <40 | Low disorder -> High Disorder |
B-factor range | Low disorder -> High Disorder. Range is between the minimum and maximum B-factor values present in the structure. | |
Entropy | Low Entropy -> High Entropy; High Conservation -> Low Conservation |
Model Schemes
Confidence gradient | Low Confidence -> High Confidence | |
Confidence class | Very high ( > .9) Confident (.9 > score > .7) Low (.7 > score > .5) Very low ( score < .5) | For both confidence colour schemes, residues are coloured by their local quality value. For SWISS-MODEL models, QMEANDisCo is used (range 0-1). For AlphaFold models, the score will be pLDDT (range 0-100). If a model is known to be an experimental structure, the B-factor range colour scheme will be used. |
Indels | MODEL AAAAAAAA---AAAAAA-AA | Highlights insertions / deletions in model |
Alignment Index Schemes
Chain | Cycle of colours | |
Rainbow | N-teminus -> C-terminus |
Residue Schemes
Hydrophobic | RKDENQHPYWSTGAMCFLVI Least hydrophobic -> Most hydrophobic |
Size | GASPVTCLINDKQEMHFRYW Smallest -> Largest |
Charged | ED (Negative) HKR (Positive) |
Polar | STNQ |
Proline | P |
Ser/Thr | ST |
Cysteine | C |
Aliphatic | ILV |
Aromatic | FYWH |
Clustal Scheme
This is an emulation of the default colourscheme used for alignments in Clustal X, a graphical interface for the ClustalW multiple sequence alignment program. Each residue in the alignment is assigned a colour if the amino acid profile of the alignment at that position meets some minimum criteria specific for the residue type.
The table below gives these criteria as clauses: { > X% xx,y }, where X is the threshold percentage presence for any of the xx (or y) residue types. For example, K or R is coloured red if the column includes more than 60% K or R (combined), or more than 80% of either K or R or Q (individually).
Category | Colour | Residue at Position | {Threshold, Residue group} |
---|---|---|---|
Hydrophobic | A I L M F W V | { 60% WLVIMAFCYHP } | |
C | { 60% WLVIMAFCYHP } | ||
Positive charge | K R | { 60% KR }, { 80% K,R,Q } | |
Negative charge | E | { 50% ED }, { 50% QE }, { 60% KR }, { 85% D,E,Q } | |
D | { 50% ED }, { 60% KR }, { 85% D,E,N } | ||
Polar | N | { 50% N }, { 85% N,Y } | |
Q | { 50% QE }, { 60% KR }, { 85% Q,E,K,R } | ||
S T | { 50% TS }, { 60% WLVIMAFCYHP }, { 85% S,T } | ||
Cysteine | C | { 85% C } | |
Glycine | G | { 0% G } | |
Proline | P | { 0% P } | |
Aromatic | H Y | { 60% WLVIMAFCYHP }, { 85% W,Y,A,C,P,Q,F,H,I,L,M,V } | |
Unconserved | any / gap | If none of the above criteria are met |
Membrane Prediction
Biounits of transmembrane proteins are identified in the SMTL solely based on structural information. The most likely membrane location is computed based on structural information using the membrane finding algorithm of the QMEANBrane tool (Studer et al.) which is based on the solvation model described for the Orientations of Proteins in Membranes database (Lomize et al.). The results serve as input to classify each biounit based on energetic and geometric criteria.
The membrane annotation is transferred to a model if at least 80% of all biounit transmembrane residues are aligned with the target sequence(s).
Modelling API
The Modelling API is intended to be used programatically for submissions of many modelling jobs where clicking through a website to submit and view the results is not practical.
SWISS-MODEL projects can be started from the command line, or using an interactive user-interface such as Swagger UI or Core API.
If you are using the coreapi auto-generated code snippets, you will need to add authentication to start a modelling project. You will find detailed help here.
Job submission and status checks are rate limited, if you send too many requests you will
receive a 429 response. The results should indicate the current submission rate. The submission
rate may change at any time, depending on demand of the service. Currently, the
rapid submission rate is 100/m
and the prolonged rate is set as 2000/6h
.
The API uses a token based authentication system, so the first step is to retrieve a token for your SWISS-MODEL user account. This token is to be placed in the header of subsequent API calls.
1: Obtain a token
This can be done from the command line, but the recommended method to discover (and regenerate) your API token is to visit your SWISS-MODEL account page.
2: Start an Automodel project
import requests response = requests.post( "https://swissmodel.expasy.org/automodel", headers={ "Authorization": f"Token {token}" }, json={ "target_sequences": [ "VLSPADKTNVKAAWAKVGNHAADFGAEALERMFMSFPSTKTYFSHFDLGHNSTQVKGHGKKVADALTKAVGHLDTLPDALSDLSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPGDFTPSVHASLDKFLASVSTVLTSKYR", "VHLTGEEKSGLTALWAKVNVEEIGGEALGRLLVVYPWTQRFFEHFGDLSTADAVMKNPKVKKHGQKVLASFGEGLKHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVVVLARHFGKEFTPELQTAYQKVVAGVANALAHKYH" ], "project_title":"This is an example using multiple targets for hemoglobin" })
2: Start an Alignment project
response = requests.post( "https://swissmodel.expasy.org/alignment", headers={ "Authorization": f"Token {token}" }, json={ "target_sequences": "KSCCPTTAARNQYNICRLPGTPRPVCAALSGCKIISGTGCPPGYRH", "template_sequence": "TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN", "template_seqres_offset": 0, "pdb_id": "1crn", "auth_asym_id": "A", "assembly_id": 1, "project_title": "This is an example of Aligment mode based on 1crn" })
2: Start a User Template project
with open("3l9y.1.A.pdb") as f: template_coordinates = f.read() response = requests.post( "https://swissmodel.expasy.org/user_template", headers={ "Authorization": f"Token {token}" }, json={ "target_sequences": "MVVKAVCVINGDAKGTVFFEQESSGTPVKVSGEVCGLAKGLHGFHVHEFGDNTNGCMSSGPHFNPYGKEHGAPVDENRHLGDLGNIEATGDCPTKVNITDSKITLFGADSIIGRTVVVHADADDLGQGGHELSKSTGNAGARIGCGVIGIAKV", "template_coordinates": template_coordinates, "project_title":"This is an example of User Template based on SODC_DROME" })
At this point, it is worth checking the status code of the response. A 202 means that valid input was received and a new modelling job will be started when resources become available.
A 200 response means that the valid input has been seen before with the same SMTL version and so the project is already completed / failed.
3: Fetch the results
# Obtain the project_id from the response created above project_id = response.json()["project_id"] # And loop until the project completes import time while True: # We wait for some time time.sleep(10) # Update the status from the server response = requests.get( f"https://swissmodel.expasy.org/project/{ project_id }/models/summary/", headers={ "Authorization": f"Token {token}" }) # Update the status status = response.json()["status"] print('Job status is now', status) if status in ["COMPLETED", "FAILED"]: break
4: Check if the job is COMPLETED and fetch the model coordinates
response_object = response.json() if response_object['status']=='COMPLETED': for model in response_object['models']: print(model['coordinates_url'])
Bulk download of coordinates, metadata and overall summary file.
By default ALL projects created using the API will be considered for the bulk download. This can be filtered by creation date, using paramaters "from_datetime" and / or "to_datetime".# Start a new job which will package all modelling jobs in a single zip archive # If any jobs are still running, a download_id will not be available and the status code will be 400 response = requests.post(f"https://swissmodel.expasy.org/projects/download/", headers={ "Authorization": f"Token {token}" }) # check that the status_code of the response is either 200 or 202 before proceeding if response.status_code not in [200, 202]: print(response.text) import sys sys.exit() # Obtain the download_id for the packaged file download_id = response.json()['download_id'] while True: time.sleep(5) response = requests.get( f"https://swissmodel.expasy.org/projects/download/{ download_id }/", headers={ "Authorization": f"Token {token}" }) # Wait for the response status to be COMPLETED if response.json()['status'] in ['COMPLETED', 'FAILED']: break # Fetch the bulk download of results from the parameter "download_url" print("Fetch the results from", response.json()["download_url"])