QMEAN

HelpCAMEO evaluation

Introduction

Estimating the quality of protein structure models is a vital step in protein structure prediction. Often one ends up in having a set of alternative models (e.g. from different modeling servers or based on alternative template structures and alignments) from which the best candidate shall be selected. Or a singe model has been built from which the absolute quality needs to be predicted in order to have an idea about its suitability for subsequent experiments. The QMEAN server provides access to three scoring functions for the quality estimation of protein structure models which allow to rank a set of models and to identify potentially unreliable region within these. Both single models and set of models submitted as tar.gz-archives can be analysed. The user has the possibility to choose between the following three scoring functions:

  • QMEANDisCo is a composite scoring function which is able to derive both global (i.e. for the entire structure) and local (i.e. per residue) absolute quality estimates on the basis of one single model. It uses the single terms of QMEAN as a basis. The main enhancement is a new term predicting local per-residue quality estimates by assessing the agreement of pairwise residue-residue distances with ensembles of distance constraints (DisCo) extracted from structures homologous to the assessed model. The homologues are identified with HHblits. If no homologues are found, the DisCo scores are not used. Since the results are expected to be sub-optimal without DisCo scores, we display a warning in these cases. All terms are combined using neural networks trained to predict per-residue lDDT score in range [0,1]. The QMEANDisCo global score is the average per-residue score and the provided error estimate is based on global QMEANDisCo scores estimated for a large set of models and represents the root mean squared difference (i.e. standard deviation) between QMEANDisCo global score and lDDT (the ground truth). As the reliability of the prediction heavily depends on model size, the provided error estimate is calculated based on models of similar size to the input.
  • QMEAN is a composite scoring function which is able to derive both global (i.e. for the entire structure) and local (i.e. per residue) absolute quality estimates on the basis of one single model. There are two global score values, QMEAN4 and QMEAN6. QMEAN4 is a linear combination of four statistical potential terms. QMEAN6 additionally uses two agreement terms evaluating the consistency of structural features with sequence based predictions. Both global scores are originally in a range [0,1] with one being good. By default they are transformed into Z-scores to relate them with what we would expect from high resolution X-ray structures. That's also what is displayed on the result pages. If you prefer the raw scores, you have to fetch them from the downloadable archives. The local scores are a linear combinations of the 4 statistical potential terms as well as the agreement terms evaluated on a per residue basis. They are as well in the range [0,1] with one being good.
  • QMEANBrane is a version of QMEAN developed to assess the local quality of alpha-helical transmembrane protein models. QMEANBrane employs specifically trained potentials for three different segments (membrane, interface and soluble) in a transmembrane protein model. The potentials get only applied on the local scores that are again in a range [0,1] with one being good. For the membrane and interface associated residues the final score is a linear combination of the 4 statistical potential terms, whereas the agreement terms also get added for the soluble part. The global scores are also calculated and the QMEAN4 score determines the ranking of the models on the result page but the global calculation is performed with the default QMEAN scoring function for soluble structures and is available in the downloadable archives.

NOTE: QMEANBrane is only available for local quality estimates. Independent of the chosen scoring function the global quality score will be computed with QMEAN.

Input Form

  1. Look at some example runs
  2. Structural input, either browse your file system or drag and drop. You can find more info on input format here.
  3. Optionally add the reference sequence (SEQRES) of your model(s). You can find more info on input format here.
  4. Choose your method
  5. Optional Input: name of your project
  6. Optional Input: If you provide an email address, you'll get the link to your results as soon as they're ready
  7. Fire the job...

Input Format Requirements

Structural Data

Either a model in PDB format or tar.gz-archives with multiple models in PDB format sharing the same reference sequence (SEQRES) can be uploaded.

SEQRES

The SEQRES input is used to generate sequence profiles for secondary structure and solvent accessibility predictions. The observed sequence in the model must be a subsequence of the SEQRES. If not provided, the SEQRES gets directly extracted from the model itself. If the model is incomplete, this can lead to inaccurate profiles affecting the aforementioned predictions. In case of single chain models or homo-oligomers, the SEQRES can be provided as plain string or in FASTA format. In case of hetero-oligomers, the SEQRES can be provided in FASTA format, where the sequence names in the SEQRES input must match the chain names in the model input.

Input Data Processing

Local qualities are visible as color gradients in the model viewer. They additionally get mapped onto the structures available in the downloadable archives as bfactors. The server provides you with two alternative structures in the archives that undergo certain processing steps.

<model_name>_raw.pdb

This is your input structure with gentle processing. Hydrogens are stripped away, modified residues are stripped to represent their base residue (e.g. Phospho-Tyrosine to Tyrosine), atoms with zero occupancy are removed and unknown residues are removed.

<model_name>_processed.pdb

This is the model being displayed on the results page. Additionally to the aforementioned processing steps we renumber the residues so they match the SEQRES, assign chain names if they're missing and potentially apply a transformation to display in the viewer.

Programmatic Access

One can access QMEAN-SERVER programatically with provided API. In order to use QMEAN submission API you have to make a POST request to the https://swissmodel.expasy.org/qmean/submit/ with following parameters. (Parameters "structure" and "email" are required)

  • ★structure - structure to evaluate. This can be either your uploaded file (multipart/form-data file content) or a URL to a PDB format file.
  • ★email - your e-mail
  • - project_name - the name of your project
  • - sequence - SEQRES to use provided as plain sequence or in FASTA format
  • - method - method to use: qmean or qmeandisco (default) or qmeanbrane
The server returns a JSON file with details of the submitted project and and the link to the results page. Example using Python:
import json
import requests

qmean_url = "https://swissmodel.expasy.org/qmean/submit/"

#############################################################
# To upload a local file found at /path/to/my_structure.pdb
# ('rb' is recommended to allow zip file upload)
response = requests.post(url=qmean_url,
                         data={ 
                            "email": "<email is required>" 
                         },
                         files={
                            "structure": open('/path/to/my_structure.pdb', 'rb')
                         })
##############################################################                         


##############################################################
# Or, to upload a file from a URL, add the URL as the parameter "structure".
response = requests.post(url=qmean_url, 
                         data={
                            "structure": "https://files.rcsb.org/download/1CRN.pdb",
                            "email": "<email is required>"
                          })                    
##############################################################

print(json.dumps(response.json(), indent=4, sort_keys=True))
{
    "created": "2019-03-07T15:33:12.932",
    
    # results_json returns the details of the project in JSON fomat
    "results_json": "https://swissmodel.expasy.org/qmean/ABCDEF.json",
    "error": null,
    "method": "QMEANDisCo",
    "models": {
        "model_001": {
            "chains": {
                "A": {
                    "atomseq": "TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN",
                    "name": "seq_chain_0",
                    "seqres": "TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN"
                }
            },
            "model_pdb": "https://swissmodel.expasy.org/qmean/ABCDEF/model_001_processed.pdb",
            "original_name": "1CRN.pdb",
            "scores": null
        }
    },

    # QMEAN website URL to view the project
    "results_page": "https://swissmodel.expasy.org/qmean/ABCDEF",
    "seqres_uploaded": null,
    "status": "QUEUEING"
}

After submission, a pre-processing step will take place as described above which may remove atoms and residues from the uploaded structure. If the resulting structure is no longer readable after this process, you will see the information as an "error" which may look like this

"error": [
          {
              "atomsRemoved": 649,
              "description": "No valid residues after pre-processing",
              "original_name": "unknown_residues.pdb",
              "residuesRemoved": 30
          },
          {
              "atomsRemoved": 713,
              "description": "No valid residues after pre-processing",
              "original_name": "zero_occupancy.pdb",
              "residuesRemoved": 93
          }
      ]

To fetch the current status, read the "results_json"

current_status = requests.get(response.json()["results_json"])

print(json.dumps(current_status.json(), indent=4, sort_keys=True))
{
   "status":"COMPLETED",
   "results_page":"https://swissmodel.expasy.org/qmean/ABCDEF",
   "created":"2019-03-07T15:33:12.932",
   "models":{
      "model_001":{
         "chains":{
            "A":{
               "seqres":"TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN",
               "name":"seq_chain_0",
               "atomseq":"TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN"
            }
         },
         "original_name":"1CRN.pdb",
         
         # Use the value of model_pdb to download the processed PDB file
         "model_pdb":"https://swissmodel.expasy.org/qmean/ABCDEF/model_001_processed.pdb",
         "scores":{
            "local_scores":{
               "A":[
                  0.7534790648045062,
                  0.8248158943386679,
                  0.8494061747142749,
             ...............................
                  0.6959062897069285,
                  0.7590875228956596,
                  0.7582459226326457
               ]
            },
            "global_scores":{
                    "acc_agreement_norm_score": 0.717391304347826,
                    "acc_agreement_z_score": 0.05215918178976942,
                    "avg_local_score": 0.7915534749937687,
                    "avg_local_score_error": 0.115,
                    "cbeta_norm_score": -0.016832596171930756,
                    "cbeta_z_score": -0.5114071551370111,
                    "interaction_norm_score": -0.03399149868662631,
                    "interaction_z_score": -0.1362756629407729,
                    "packing_norm_score": -0.2602688412222525,
                    "packing_z_score": -0.5068264529038528,
                    "qmean4_norm_score": 0.763869301899829,
                    "qmean4_z_score": -0.15807925940005427,
                    "qmean6_norm_score": 0.737371454701935,
                    "qmean6_z_score": -0.4069132561394536,
                    "ss_agreement_norm_score": 0.2899284981515097,
                    "ss_agreement_z_score": -1.3091550740346993,
                    "torsion_norm_score": -0.3343344484677627,
                    "torsion_z_score": 0.29964082929082386
            }
         }
      }
   },
   "seqres_uploaded":null,
   "method":"QMEANDisCo"
}
      
Reference for the QMEAN scoring function:
[1]
Benkert P, Biasini M, Schwede T
Toward the estimation of the absolute quality of individual protein structure models.

Reference for the QMEANDisCo scoring function:
[2]
Studer G, Rempfer C, Waterhouse AM, Gumienny R, Haas J, Schwede T
QMEANDisCo - distance constraints applied on model quality estimation.

Reference for the QMEANBrane scoring function:
[3]
Studer G, Biasini M, Schwede T
Assessing the local structural quality of transmembrane protein models using statistical potentials (QMEANBrane).

A single model method combining statistical potentials and agreement terms in a linear manner
A single model method combining statistical potentials and agreement terms with a distance constraints (DisCo) score. DisCo evaluates consistencies of pairwise CA-CA distances from a model with constraints extracted from homologous structures. All scores are combined using a neural network trained to predict per-residue lDDT scores.
QMEANBrane is a combination of statistical potentials targeted at local quality estimation of membrane protein models in their naturally occurring oligomeric state: after identifying the transmembrane region using an implicit solvation model, specifically trained statistical potentials get applied on the different regions of a protein model
Reference sequence (SEQRES) of submitted protein model. This sequence is used for secondary structure and solvent accessibility predictions. If not provided, the sequence gets directly extracted from the model. See the help page for further input information.
The plot relates the obtained global QMEAN4 value to scores calculated from a set of high-resolution X-ray structures.
Local quality is either estimated using the raw QMEAN scoring function or one of the two specialized functions QMEANBrane and QMEANDisCo. They all provide scores in range [0,1] with one being good.
QMEAN4 is a linear combination of four statistical potential terms. It is trained to predict global lDDT score in range [0,1]. The value displayed here is transformed into a Z-score to relate it with what one would expect from high resolution X-ray structures.
The QMEANDisCo global score is the average per-residue score and the provided error estimate is based on global QMEANDisCo scores estimated for a large set of models and represents the root mean squared difference (i.e. standard deviation) between QMEANDisCo global score and lDDT (the ground truth). As the reliability of the prediction heavily depends on model size, the provided error estimate is calculated based on models of similar size to the input.