Introduction

The SWISS-MODEL Workspace is a web-based integrated service dedicated to protein structure homology modelling. It assists and guides the user in building protein homology models at different levels of complexity.

Successful model building requires at least one experimentally determined 3D structure (template) that shows significant amino acid sequence similarity with the target sequence. Building a homology model comprises four main steps: identification of structural template(s), alignment of target sequence and template structure(s), model building, and model quality evaluation. These steps can be repeated until a satisfying modelling result is achieved. Each of the four steps requires specialized software and access to up-to-date protein sequence and structure databases.

Protein sequence and structure databases necessary for modelling are accessible from the workspace and are updated in regular intervals. Software tools for template selection, model building, and structure quality evaluation can be invoked from within the workspace. A personal working environment (workspace), where several modelling projects can be carried out in parallel, is provided for each user.

The following tutorial aims to facilitate the first steps of working with SWISS-MODEL Workspace. Please let us know if you would like to see other features explained in this tutorial (help-swissmodel@unibas.ch).

How do I work with SWISS-MODEL workspace ?

How can I build an homology model using SWISS-MODEL workspace ?

How can I assess protein structure model quality ?

 



1. How do I work with SWISS-MODEL workspace ? => How can I create an account?

The SWISS-MODEL Workspace provides a personal web-based area for each user in which protein homology models can be built and the results of completed modelling projects are stored and visualized. It is not necessary to create an account; you may continue to use SWISS-MODEL as before by just providing an email address in the submission form, or by bookmarking the submission window. However, you will not be able to manage your projects inside the Workspace, and we therefor strongly recommend to create your own account:




1. How do I work with SWISS-MODEL workspace ? => How can I manage my projects?

In the workspace a list of the current modeling work units is displayed, including the workunit type, a title provided by the user, and the status of the workunit:

The current status of a work unit is indicated by a graphical symbol: submitted (the job has been submitted to the queuing system and is waiting for execution), running (job is currently running and programs are calculating), finished (job has been completed, results are available) or failed/stopped (something went wrong during the process).


After completion of the modelling procedure (~ a few minutes up to several hours), the results are stored in the workspace and the user is notified about the completion.The user can access the results output by clicking on the work unit ID number.
The results are stored for one week on the server. The remainig time before deletion of a given work unit is also displayed. The user can decide to either delete a work unit or to prolonge its life span by clicking on the corresponding link.

Beware: Workunits are kept on the server for one week before they are deleted automatically. You may postpone deletion by one week by pressing the green "refresh arrow". Please download the modelling results within this timeframe to your local system. Each user has a quota of up to a maximum of 25 work units which can be stored simultaneously.




2. My protein is quite large, and would like to identify individual domains I could model separately.

Many proteins are modular and made up of several structurally distinct domains, which often reflect evolutionary relationships and may correspond to units of molecular function.The sensitivity and performance of profile-based template search methods can often be improved when the template search is performed on individual domains rather than the whole target sequence. The member databases of InterPro (Mulder et al.) allow for both the identification of protein domains and the assignment of protein function. Using the InterPro Domain Scan (IprScan, Zdobnov et al.), protein domains and functional sites can be assigned to regions of a target sequence.

See: [Tools] [ Secondary Structure Prediction and Domain Assignment ]

Let's use the example of Collagen alpha 3(VI) chain (UniProt accession code: P12111) to identify individual domains in the target sequence. The result looks like this:

 

The location of the individual domains is provided in tabular form below the graphics. Links to the motif definition in InterPro are provided.

Interpro Scan has finished. Here are the results:

IPR002035: von Willebrand factor, type A, Domain
  PF00092: 39 - 213 VWA
  PF00092: 242 - 415    VWA
  PF00092: 445 - 620    VWA
  PF00092: 639 - 812    VWA
  PF00092: 837 - 1009   VWA
  PF00092: 1029 - 1201  VWA
  PF00092: 1233 - 1404  VWA

  PF00092: 1436 - 1609  VWA
  PF00092: 1639 - 1812  VWA
  PF00092: 2402 - 2581  VWA
  PF00092: 2619 - 2810  VWA

 



If I have no idea about possible templates for my target, and I want to identify possible template structures.

The degree of difficulty in identifying a suitable template for a target sequence can range from "trivial" for well-characterized protein families to "impossible" for proteins with an unknown fold. The SWISS-MODEL Workspace provides access to a set of increasingly complex and computationally demanding methods to search for templates within the SWISS-MODEL Template library.

SwissModel Template Library (ExPDB)

The template structure database used by SWISS-MODEL (SMTL or ExPDB library) is derived from the Protein Data Bank (Westbrook et al.). In order to allow sequence-based template searches, each PDB entry is split into individual chains. The separated template chains are annotated with information about experimental method, resolution (if applicable), ANOLEA mean force potential (Melo et al.), Gromos96 energy (van Gunsteren et al.) and PQS (Henrick et al.) quaternary state assignment to allow for rapid retrieval of the relevant structural information during template selection. Theoretical models, structures only consisting of C alpha atoms and irregularly formatted database entries are removed. Templates sharing 100% sequence identity are grouped into a SMTL100 library using the program CD-HIT (Li et al.). Clusters of sequences having 90%, 70% and 50% sequence identity are derived from the RCSB non-redundant PDB lists.

[Tools] [ SwissModel Template Library ]

You may query if a certain PDB entry is part of SMTL. In this example, we search for chains of PDB entry "1HIV". SMTL provides information about the experimental methods used for structure determination, resolution (if applicable), and links to the original PDB entry as well as protein structure classification by SCOP and CATH.

Caveat: A significant part of proteins are multimeric in their biologically active state. Single chains, or raw PDB entries often do not represent the biologically correct assembly. The PQS Protein Quaternary Structure Server (Henrick et al.) allows for searching of the list of likely quaternary structures generated at the EBI. As in our example, HIV-I protease is known to be active as a dimer. Multimeric proteins can be modelled in SWISS-MODEL Workspace using the Project Mode.

 

The target sequence can be used to query the SMTL for suitable template structures using "Template identification" in the Tools menu:

[Tools] [ Template Identification]

A condensed graphical view of the modeling task is provided containing the target sequence, the template matches sorted and colored according to the associated E-value. Clickable bars indicate the matched regions and guide the user to the underlying original program output.

[Display Alignment in DeepView]

Target-template alignments from the search tools (BLAST or SAM) can be visualized in DeepView to correct misplaced insertions and deletions in the structural context of the template, and to manually adjust misaligned regions. The modified project can then be saved to disk and submitted as "project mode" to the workspace for model building by the SWISS-MODEL pipeline.




How do I use the fully automatic mode of SWISS-MODEL workspace?

The "automated mode" is suited for cases where the target-template similarity is sufficiently high to allow for fully automated modelling. As a rule of thumb, automated sequence alignments are sufficiently reliable when target and template share more than 50% percent of sequence identity.

 

This submission requires only the amino acid sequence (FASTA format or single letter raw sequence) or the UniProt accession code of the target protein as input data. The modelling pipeline automatically selects suitable templates based on a Blast E-value limit, which can be adjusted upon submission (Altschul et al.). The automated template selection will favour high-resolution template structures with reasonable stereochemical properties as assessed by ANOLEA mean force potential (Melo et al.) and Gromos96 force field energy (van Gunsteren et al.).

Example: Modelling the catalytic domain of Cyclodextrin glucanotransferase from Bacillus stearothermophilus (UniProt AC code: Q9ZAQ0).

[ Modelling ] [ Automated Mode ]

Note: Workunits will be automatically deleted after 1 week from the server. When the modelling project is finished, please download the results and save them locally:

  • Model coordinate file (PDB format or DeepView project format) PDB files can be displayed using DeepView, Dino, or otehr tools for molecular visualization.
  • Logfile PDF: The content of the web page (including images and logfiles, but not the model coordinates) can be downloaded and saved as PDF. See "Print this page [pdf]" at the top of the page. PDF files can be displayed using Acrobat Reader.

 



Alignment Mode

Multiple sequence alignments are a common tool in many molecular biology projects. If the three-dimensional structure is known for at least one of the members, this alignment can be used as starting point for comparative modelling using the "alignment mode".
The "alignment mode" allows the user to test several alternative alignments and evaluate the quality of the resulting models in order to achieve an optimal result.

In order to facilitate the use of alignments in different formats, the submission is implemented as a three step procedure:

1. Prepare a multiple sequence alignment.

  • It must contain at least your target sequence and the template sequence
  • Use any of your favorite alignment tools. We recommend T_COFFEE by Cedric Notredame
  • Make sure the sequence names are "reasonable"

2. Submit your alignment to the Workspace Alignment Mode.

  • Possible formats are: FASTA, MSF, CLUSTALW, PFAM and SELEX
  • You may either upload your file or cut & paste
  • Don't forget to specify the correct alignment format
  • Here is a small example for testing (cut & paste):
CLUSTAL W (1.82) multiple sequence alignment
THN_DENCL       KSCCPTTAARNQYNICRLPGTPRPVCAALSGCKIISGTGCPPGYRH- 46
THNX_TEST       KSCCPDTTGRDIYNTCRFGGGSRQVCARISGCKIISASTCPS-YPNK 46
1crnA           TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN- 46
                .:***  ..*  :  **: * .. :**  :** **..: **  *   

3. Select Target and Template

  • The alignment (as it was interpreted by the server) should now be displayed in the bottom part of the page.
  • The script will try to make a good guess for the correct names based on your submission.
  • Select the sequence name of the target sequence (e.g. THN_DENCL)
  • Select the sequence of the template structure (e.g. 1crnA). You don't need to use PDB IDs, you may use any name you like.
  • Specify the template structure to which this sequence belongs. This template MUST be part of the ExPDB template library. Please use the SWISS-MODEL Template library tool to check...
  • Don't forget to specify the correct CHAIN ID. Note that PDB's chain IDs are normally in capital letters.
Target  sequence:       
Template sequence:PDB-Code:Chain-ID: 

4. Check Alignment and Submit

  • The alignment at the bottom of the page should represent the correct mapping of the template structure on the target sequence. Please check carefully before submission.
  • As usual, please provide name and e-mail for the SWISS-MODEL submission.
  • Good Luck with you model ....
The server pipeline will build the model purely based on this alignment. During the modelling process, implemented as rigid fragment assembly in the SWISS-MODEL (Schwede et al.) pipeline, the modelling engine might introduce minor heuristic modifications to the placement of insertions and deletions.


Supported Alignment formats

The following formats are currently supported: FASTA, MSF, CLUSTALW, PFAM and SELEX;

Examples:

fasta
:

>THN_DENCL
KSCCPTTAARNQYNICRLPGTPRPVCAALSGCKIISGTGCPPGYRH-
>THNX_TEST
KSCCPDTTGRDIYNTCRFGGGSRQVCARISGCKIISASTCPS-YPNK
>1crnA
TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN-

clustal:

CLUSTAL W (1.82) multiple sequence alignment
THN_DENCL       KSCCPTTAARNQYNICRLPGTPRPVCAALSGCKIISGTGCPPGYRH- 46
THNX_TEST       KSCCPDTTGRDIYNTCRFGGGSRQVCARISGCKIISASTCPS-YPNK 46
1crnA           TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN- 46
                .:***  ..*  :  **: * .. :**  :** **..: **  *   


msf:

 !!AA_MULTIPLE_ALIGNMENT 1.0

  thn_dencl.msf MSF:  47 Type: P 08/08/05 CompCheck:  427 ..

  Name: THN_DENCL  Len: 47  Check: 8212 Weight: 1.00
  Name: THNX_TEST  Len: 47  Check: 5295 Weight: 1.00
  Name: 1crnA      Len: 47  Check: 6920 Weight: 1.00

//

           1                                            47
THN_DENCL  KSCCPTTAARNQYNICRLPGTPRPVCAALSGCKIISGTGCPPGYRH~
THNX_TEST  KSCCPDTTGRDIYNTCRFGGGSRQVCARISGCKIISASTCPS.YPNK
1crnA      TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN~



How do I use the Project Mode mode of SWISS-MODEL workspace ?

Main application: Visual inspection of alignments; modelling of Oligomeric proteins.

In difficult modeling situations, where the correct alignment between target and template cannot be clearly determined by sequence based methods, visual inspection and manual manipulation of the alignment can significantly help improving the quality of the resulting model. Project files containing the superposed template structures, and the alignment between the target and the template can be generated using the program DeepView (Swiss-PdbViewer Guex et al).The user has therfor full control over essential modelling parameters, i.e. the choice of template structures, the correct alignment of residues, and the placement of insertions and deletions in the context of the three-dimensional structure.
Modelling of oligomeric proteins with Swiss-Model Workspace can be done using the Project Mode.

The program DeepView can be downloaded freely from the ExPASy web site. DeepView does not require administrator privileges for installation. E.g. under MS windows, simply uncompress the distributed archive at any location you like (e.g. c:\spdbv or on your desktop) and start working by starting the spdbv.exe application. Tutorials, manuals and discussion group for DeepView can be found on the DeepView web site.

Example: Modelling a dimeric protein

In order to demonstrate Oligomer-Modelling, we are going to build a model of the protease of murine leukemia virus based on thestructure of Nelfinavir-resistant HIV-1 protease (D30N/N88D) in complex with Darunavir [3HVP]. (Please keep in mind that this just an example to illustrate the workflow. Most likely using this template will not make much scientific sense in most cases.)

  1. Get the template in the correct quaternary state
    First, check the correct biological assembly of your template protein. Copies of the assymetric unit of the PDB files can be generated by applying the correct crystallographic symmetry operators. The PDB download page will allow you to download a “biological assembly” file. If you are unsure which assembly to use, the [PISA] server helps to visualize alternative oligomeric states. Download and save the template coordinates as PDB file to your local disk. [3lzv_dimerAB.pdb].
  2. Remove all non-aminoacid residues
    Open the file in DeepView and remove all non-aminoacid groups such as ions, ligands, OXT, etc. from the template (unless they are at the very end of the file). You can do this by selecting the groups in the control panel of DeepView and Remove the selected residues ("Build" menu).
  3. Ensure Unique Chain IDs
    Make sure each chain has a unique name, e.g. "A","B", etc. Coloring the molecule by chain helps to check. Here is an example file for download for this tutorial [3lzv_dimerAB.pdb].
  4. Target Sequence
    In our example, we will model the protease domain of murine leukemia virus (UniProt AC: P03356). As you can see, the virus encoded polyprotein consists of several domains. Before modelling, it make things easier to focus on the interesting segment. You may use e.g. the IprScan utility to identify the individual domains. In our case, we will use residue 3-100.

    Create a FASTA file with your target sequences for each chain in the SAME order as in the template, i.e. "A", then "B" etc separated by semicolons. [target.txt]

    >TARGET
    QGQEPPPEPRITLTVGGQPVTFLVDTGAQH
    SVLTQNPGPLSDRSAWVQGATGGKRYRWTT
    DRKVHLATGKVTHSFLHVPDCPYPLLGR
    DL
    LTKLKAQI
    ;
    QGQEPPPEPRITLTVGGQPVTFLVDTGAQH
    SVLTQNPGPLSDRSAWVQGATGGKRYRWTT
    DRKVHLATGKVTHSFLHVPDCPYPLLGR
    DL
    LTKLKAQI


  5. Load the target sequence into DeepView
    Please make sure to start with loading the amino acid sequence of your target protein *first* using the "SWISS-MODEL" menu - before loading any template structures.
  6. Load the template structure into DeepView
    and generate a preliminary target-template alignment using Menu: Fit - Fit raw sequence.
  7. Adjust target-template Alignment in DeepView
    Open the alignment window and adjust alignment. Make sure NOT to align residues of different chains ("color by chains" helps to see the chain boundaries in both sequences). Do not align to "non aminoacid residues" like het groups, OXT. Make sure all insertions & deletions are correctly positioned in the structural context.



  8. SWISS-MODEL Submission
    Save the project to your local disk [e.g. tutorial_dimerAB.pdb] and submit the file to the project mode of SWISS-MODEL workspace for model building. A new workunit will be created, containing the modelling results, includung log file, ANOLEA evaluation, and model project file of the modelled dimer.

[ Modelling ] [ Project Mode ]

Model of the dimeric protease.



What accuracy can I expect for a model build by the automated mode of SWISS-MODEL?

Evaluation of template structure and model quality is a crucial step in homology modelling. The reliability of different protein modeling methods can be assessed by evaluating the results of blind predictions after the corresponding protein structures have been determined experimentally. The overall performance of the SWISS-MODEL pipeline is evaluated by the EVA project. SWISS-MODEL was the first comparative modelling server to join the EVA project in May 2000, and has since then been continuously evaluated. As of Summer 2005, EVA-CM is based on the assessment of 261 weekly releases of the PDB database, resulting in 48098 protein models for 19698 protein target chains for five different prediction servers, among these 18314 from SWISS-MODEL. All models generated by SWISS-MODEL server, evaluation results, score definitions and detailed statistics are available from the EVA project website.

The C-alpha atoms RMSD after global superimposition of the model and the experimental target structures was computed and plotted vs. % of sequence identity between target and best template to give an estimation of the overall accuracy of the different modelling servers with regards to different sequence identities between target and template:

In general, major differences between the individual prediction methods are only observed for target-template pairs sharing sequence identities of less than 40 %, where methods favouring higher coverage of the target sequences are more likely to generate models with a higher RMSD. As expected, model RMSD is increasing with decreasing alignment accuracy as defined by the percentage of equivalent C-alpha positions (within 3.5 Angstroms) between the optimally superimposed target and model structures:



How can I assess a structure or model with empirical force-field and Mean Force Potential methods?

Evaluation of model quality is a crucial step in homology modeling. While the performance of the automated SWISS-MODEL (Schwede et al.) pipeline in general is continuously evaluated by the EVA project (Koh et al.), the quality of individual models can vary significantly.

Therefore, graphical plots of Anolea mean force potential (Melo et al.), GROMOS empirical force field energy (van Gunsteren et al.) and Verify3D profile evaluation (Eisenberg et al.) are provided to enable the user to estimate the quality of protein models and template structures.

Anolea: The atomic empirical mean force potential ANOLEA (Melo et al.) is used to assess packing quality of the models. The program performs energy calculations on a protein chain, evaluating the "Non- Local Environment" (NLE) of each heavy atom in the molecule. The y-axis of the plot represents the energy for each amino acid of the protein chain. Negative energy values (in green) represent favourable energy environment whereas positive values (in red) unfavourable energy environment for a given amino acid.

Verify3D:
The Verify3D (Eisenberg et al.) method assess protein structures using three-dimensional profiles. This program analyzes the compatibilty of an atomic model (3D) with its own amino acid sequence (1D). Each residue is assigned a structural class based on its location and environment (alpha, beta, loop, polar, apolar etc). Then a database generated from good structures is used to obtain a score for each of the 20 amino acids in this structural class. The vertical axis in the plot represents the average 3D-1D profile score for each residues in a 21-residue sliding window. The scores ranges from -1 (bad score) to +1 (good score).

 

Gromos: The y-axis of the plot represents the GROMOS (van Gunsteren et al.) empirical force field energy for each amino acid of the protein chain. Negative energy values (in green) represent favourable energy environment whereas positive values (in red) unfavourable energy environment for a given amino acid.





How can I assess geomoetrical accuracy of a structure or model?

Evaluation of model quality is a crucial step in homology modeling. While the performance of the automated SWISS-MODEL (Schwede et al.) pipeline in general is continuously evaluated by the EVA project (Koh et al.), the quality of individual models can vary significantly.

Therefore, graphical plots of Anolea mean force potential (Melo et al.), GROMOS empirical force field energy (van Gunsteren et al.), Verify3D profile evaluation (Eisenberg et al.), Whatcheck (Hooft et al.) and Procheck (Laskowski et al.) reports are provided to enable the user to estimate the quality of protein models and template structures.

Procheck

 

The PROCHECK suite of programs (Laskowski et al.) assess the "stereochemical quality" of a given protein structure. The aim of PROCHECK is to assess how normal, or conversely how unusual, the geometry of the residues in a given protein structure is, as compared with stereochemical parameters derived from well-refined, high-resolution structures.

What Check

What Check comprises several tools for protein structure verification (Hooft et al.).
Besides of a detailed report, a summary for "users of a structure" is provided. Detailled documentation for the WHAT_CHECK output is available at the WHAT_CHECK homepage.

Example outputs:


SWISS-MODEL is developed by the Protein Structure Bioinformatics group at the SIB - Swiss Institute of Bioinformatics & the Biozentrum University of Basel. © 2011.