QMEAN

HelpCAMEO evaluation

Contents


1. Introduction
2. Input Form
3. Input Format Requirements
4. Input Data Processing
5. Programmatic Access
6. References

1. Introduction


Estimating the quality of protein structure models is a vital step in protein structure prediction. Often one ends up in having a set of alternative models (e.g. from different modeling servers or based on alternative template structures and alignments) from which the best candidate shall be selected. Or a singe model has been built from which the absolute quality needs to be predicted in order to have an idea about its suitability for subsequent experiments. The QMEAN server provides access to three scoring functions for the quality estimation of protein structure models which allow to rank a set of models and to identify potentially unreliable region within these. Both single models and set of models submitted as tar.gz-archives can be analysed. The user has the possibility to choose between the following three scoring functions:

NOTE: QMEANBrane is only available for local quality estimates. Independent of the choosen scoring function the global quality score will be computed with QMEAN.



2. Input Form


  1. Look at some example runs
  2. Structural input, either browse your file system or drag and drop. You can find more info on input format here.
  3. Optionally add the reference sequence (SEQRES) of your model(s). You can find more info on input format here.
  4. Choose your method
  5. Optional Input: name of your project
  6. Optional Input: If you provide an email address, you'll get the link to your results as soon as they're ready
  7. Fire the job...


3. Input Format Requirements

Structural Data

Either a model in PDB format or tar.gz-archives with multiple models in PDB format sharing the same reference sequence (SEQRES) can be uploaded.

SEQRES

The SEQRES input is used to generate sequence profiles for secondary structure and solvent accessibility predictions. The observed sequence in the model must be a subsequence of the SEQRES. If not provided, the SEQRES gets directly extracted from the model itself. If the model is incomplete, this can lead to inaccurate profiles affecting the aforementioned predictions. In case of single chain models or homo-oligomers, the SEQRES can be provided as plain string or in FASTA format. In case of hetero-oligomers, the SEQRES can be provided in FASTA format, where the sequence names in the SEQRES input must match the chain names in the model input.



4. Input Data Processing


Local qualities are visible as color gradients in the model viewer. They additionally get mapped onto the structures available in the downloadable archives as bfactors. The server provides you with two alternative structures in the archives that undergo certain processing steps.

<model_name>_raw.pdb

This is your input structure with gentle processing. Hydrogens are stripped away, modified residues are stripped to represent their base residue (e.g. Phospho-Tyrosine to Tyrosine), atoms with zero occupancy are removed and unknown residues are removed.

<model_name>_processed.pdb

This is the model being displayed on the results page. Additionally to the aforementioned processing steps we renumber the residues so they match the SEQRES, assign chain names if they're missing and potentially apply a transformation to display in the viewer.



5. Programmatic Access


One can access QMEAN-SERVER programatically with provided API. In order to use QMEAN submission API you have to make a POST request to the https://swissmodel.expasy.org/qmean/submit/ with following parameters. (Parameters "structure" and "email" are required) The server returns a JSON file with details of the submitted project and and the link to the results page. Example using Python:
import json
import requests

qmean_url = "https://swissmodel.expasy.org/qmean/submit/"

# To upload a file from a URL, add the URL as the parameter "structure".
response = requests.post(url=qmean_url, data={"structure": "https://files.rcsb.org/download/1CRN.pdb", "email": "your@email.com"})

# When using Python requests - to upload from a local file, put the file in files.
# 'rb' is recommended to allow zip file upload
response = requests.post(url=qmean_url, data={"email": "your@email.com"}, files={"structure": open('my_structure.pdb','rb')})

print(json.dumps(response.json(), indent=4, sort_keys=True))
{
    "created": "2019-03-07T15:33:12.932",
    
    # results_json returns the details of the project in JSON fomat
    "results_json": "https://swissmodel.expasy.org/qmean/ABCDEF.json",
    "error": null,
    "method": "QMEANDisCo",
    "models": {
        "model_001": {
            "chains": {
                "A": {
                    "atomseq": "TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN",
                    "name": "seq_chain_0",
                    "seqres": "TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN"
                }
            },
            "model_pdb": "https://swissmodel.expasy.org/qmean/ABCDEF/model_001_processed.pdb",
            "original_name": "1CRN.pdb",
            "scores": null
        }
    },

    # QMEAN website URL to view the project
    "results_page": "https://swissmodel.expasy.org/qmean/ABCDEF",
    "seqres_uploaded": null,
    "status": "QUEUEING"
}

After submission, a pre-processing step will take place as described above which may remove atoms and residues from the uploaded structure. If the resulting structure is no longer readable after this process, you will see the information as an "error" which may look like this

"error": [
        {
            "atomsRemoved": 649,
            "description": "No valid residues after pre-processing",
            "original_name": "unknown_residues.pdb",
            "residuesRemoved": 30
        },
        {
            "atomsRemoved": 713,
            "description": "No valid residues after pre-processing",
            "original_name": "zero_occupancy.pdb",
            "residuesRemoved": 93
        }
    ]

To fetch the current status, read the "results_json"

current_status = requests.get(response.json()["results_json"])

print(json.dumps(current_status.json(), indent=4, sort_keys=True))
{
   "status":"COMPLETED",
   "results_page":"https://swissmodel.expasy.org/qmean/ABCDEF",
   "created":"2019-03-07T15:33:12.932",
   "models":{
      "model_001":{
         "chains":{
            "A":{
               "seqres":"TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN",
               "name":"seq_chain_0",
               "atomseq":"TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN"
            }
         },
         "original_name":"1CRN.pdb",
         
         # Use the value of model_pdb to download the processed PDB file
         "model_pdb":"https://swissmodel.expasy.org/qmean/ABCDEF/model_001_processed.pdb",
         "scores":{
            "local_scores":{
               "A":[
                  0.7534790648045062,
                  0.8248158943386679,
                  0.8494061747142749,
             ...............................
                  0.6959062897069285,
                  0.7590875228956596,
                  0.7582459226326457
               ]
            },
            "global_scores":{
                    "acc_agreement_norm_score": 0.717391304347826,
                    "acc_agreement_z_score": 0.05215918178976942,
                    "avg_local_score": 0.7915534749937687,
                    "avg_local_score_error": 0.115,
                    "cbeta_norm_score": -0.016832596171930756,
                    "cbeta_z_score": -0.5114071551370111,
                    "interaction_norm_score": -0.03399149868662631,
                    "interaction_z_score": -0.1362756629407729,
                    "packing_norm_score": -0.2602688412222525,
                    "packing_z_score": -0.5068264529038528,
                    "qmean4_norm_score": 0.763869301899829,
                    "qmean4_z_score": -0.15807925940005427,
                    "qmean6_norm_score": 0.737371454701935,
                    "qmean6_z_score": -0.4069132561394536,
                    "ss_agreement_norm_score": 0.2899284981515097,
                    "ss_agreement_z_score": -1.3091550740346993,
                    "torsion_norm_score": -0.3343344484677627,
                    "torsion_z_score": 0.29964082929082386
            }
         }
      }
   },
   "seqres_uploaded":null,
   "method":"QMEANDisCo"
}

Reference for the QMEAN scoring function:
[1] Benkert, P., Biasini, M., Schwede, T. Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics 27, 343-350 (2011).

Reference for the QMEANDisCo scoring function:
[2] Studer, G., Rempfer, C., Waterhouse, A.M., Gumienny, G., Haas, J., Schwede, T. QMEANDisCo - distance constraints applied on model quality estimation. Bioinformatics 36, 1765-1771 (2020).

Reference for the QMEANBrane scoring function:
[3] Studer, G., Biasini, M., Schwede, T. Assessing the local structural quality of transmembrane protein models using statistical potentials (QMEANBrane). Bioinformatics 30, i505-11 (2014).

A single model method combining statistical potentials and agreement terms in a linear manner
A single model method combining statistical potentials and agreement terms with a distance constraints (DisCo) score. DisCo evaluates consistencies of pairwise CA-CA distances from a model with constraints extracted from homologous structures. All scores are combined using a neural network trained to predict per-residue lDDT scores.
QMEANBrane is a combination of statistical potentials targeted at local quality estimation of membrane protein models in their naturally occurring oligomeric state: after identifying the transmembrane region using an implicit solvation model, specifically trained statistical potentials get applied on the different regions of a protein model
Reference sequence (SEQRES) of submitted protein model. This sequence is used for secondary structure and solvent accessibility predictions. If not provided, the sequence gets directly extracted from the model. See the help page for further input information.
The plot relates the obtained global QMEAN4 value to scores calculated from a set of high-resolution X-ray structures.
Local quality is either estimated using the raw QMEAN scoring function or one of the two specialized functions QMEANBrane and QMEANDisCo. They all provide scores in range [0,1] with one being good.
QMEAN4 is a linear combination of four statistical potential terms. It is trained to predict global lDDT score in range [0,1]. The value displayed here is transformed into a Z-score to relate it with what one would expect from high resolution X-ray structures.
The QMEANDisCo global score is the average per-residue score and the provided error estimate is based on global QMEANDisCo scores estimated for a large set of models and represents the root mean squared difference (i.e. standard deviation) between QMEANDisCo global score and lDDT (the ground truth). As the reliability of the prediction heavily depends on model size, the provided error estimate is calculated based on models of similar size to the input.