QMEAN Server - Quick HelpIntroductionEstimating the quality of protein structure models is a vital step in protein structure prediction. Often one ends up in having a set of alternative models (e.g. from different modeling servers or based on alternative template structures and alignments) from which the best candidate shall be selected. Or a singe model has been built from which the absolute quality needs to be predicted in order to have an idea about its suitability for subsequent experiments. The QMEAN server provides access to two scoring functions for the quality estimation of protein structure models which allow to rank a set of models and to identify potentially unreliable region within these. Both single models and set of models submitted as tar.gz-archives can be analysed. The user has the possibility to choose between the following two scoring functions:
The accuracy of the QMEANclust quality estimation improves with the size, the diversity (models from different severs, models based on different templates etc.) and the quality (fraction of near-native structures) of the model ensemble to be analysed. In order to obtain meaningful results a minimum number of models should be provided (e.g. > 30 models). Input format requirementsEither single models (PDB-format) or tar.gz-archives with multiple models of the same protein can be uploaded. Additionally, if more than one model is submitted, the full-length sequence of the protein has to be provided (as sequence string or in FASTA format). In the case of multiple models, the sequences of the models are mapped on the target sequence and the models are automatically renumbered if necessary. A flag can be set in order to penalise incomplete models. In this case, the model score is additionally multiplied by the fraction of modelled residues with respect to the input sequence thereby punishing short models. Results pageAs an demonstration, pre-calculated example results from the CASP7 blind test experiment are shown below and are accessible interactively on the following page. The results page begins with an summary on the input data and provides compact tables concerning the model ensemble. E.g. a table is provided which lists for each model the values of the 6 scoring function terms contributing to QMEAN. A short description of all terms can be found in the table below:
The inspection of the differences of the terms between the models may help understanding which terms contributed most to the low quality estimate of a certain model. For the four statistical potential terms, lower pseudo energies mean higher reliability. The QMEAN score as well as the two agreement terms range from 0 to 1 with higher values for more reliable candidates. In the case of QMEANclust, "local conformational diversity" plots showing the median QMEANclust score per position are provided which help to analyse the diversity within the ensemble of models. Since the QMEAN score is protein-size dependent (i.e. larger proteins tend to have higher scores), the QMEANnorm score has been introduced (manuscript in preparation).. In QMEANnorm, the four statistical potential terms are normalized: the interaction energy is divided by the total number of interactions and the other two (single-body) terms are normalized by the protein size. The QMEAN detail table provided in the summary section contains both the original statistical potentials terms and their normalized counterparts together with the QMEAN and the QMEANnorm scores. Unless not specified otherwise, everywhere the term QMEAN is used on the webpage, it refers to the QMEANnorm score described above which is the primary quality score used to calculate the Z-score and rank the models. For each model the following data and plots are provided in separate columns of the output table (depending on the quality estimation method, the ranking of the models is based on the QMEAN or QMEANclust score):
The range of local error estimates varies considerably between the two local versions of QMEAN and QMEANclust. QMEANlocal is, as a consequence of the statistical potential terms used, unable to discriminate between serious and very serious deviations (e.g between 5 Å and 15 Å). QMEANclust on the other hand can, depending on the quality and size of the ensemble, provides error estimates for even large errors. NEW Estimated absolute model qualityThe QMEAN Z-score [2] provides an estimate of the absolute quality of a model by relating it to reference structures solved by X-ray crystallography. The QMEAN Z-score is an estimate of the "degree
of nativeness" of the structural features observed in a model by describing the likelihood that a model is of comparable quality to high-resolution experimental structures. Plot 1:The area built by the circles colored in different shades of grey in the plot on the left hand side represent the QMEAN scores of the reference structures from the PDB. The model's QMEAN score is compared to the scores obtain for experimental structures of similar size (model size +/- 10%) and a Z-score is calculated. A Z-score (or standard score) is a score which is normalised to mean 0 and standard deviation 1. Thus the QMEAN Z-score directly indicates how many standard deviations the model's QMEAN score differs from expected values for experimental structures. In analogy, Z-scores are calculated for all four statistical potential terms as well as the agreement terms being part of the QMEAN score (see also Plot 3).Plot 2:The plot in the middle shows the density plot (based on the QMEAN score) of all reference models used in the Z-score calculation. The location of the query model w.r.t. the background distribution is marked in red. This plot basically is a "projection" of the first plot for the given protein size. The number of reference models used in the calculation is shown at the bottom of the plot.Plot 3:The analysis of these Z-scores of the individual terms can help identifying the geometrical features responsible for an observed large negative QMEAN Z-score. Models of low quality are expected to have strongly negative Z-scores for QMEAN but also for most of the contributing terms. Large negative values correspond to red regions in the color gradient. "Good structures" are expected to have all sliders in the light red to blue region.
ReferencesReference for the QMEAN scoring function: Reference for the QMEAN Z-scores: Reference for the QMEANclust scoring function: Reference for the QMEAN server: |
||||||||||||||||||||||||||