TBvar3D

Help

Introduction

The TBvar3D web service enables the user to analyze protein variants in their structural context. It automatically maps the variant to a suitable protein structure and calculates conservation scores, mutational impact, chemical difference of the mutations and surface accessiblity of the mutation site. This information is displayed in the results page and is further integrated with functional annotations of the protein sequence and (if applicable) the variants of the antibiotic resistance catalog in Mycobacterium tuberculosis from the WHO.

The Var3D process compromises of 4 steps: (i) Processing and validation of the user input (ii) Aggregation of structural and variant data (iii) annotation of protein sequences, variants and protein structures (iv) display of the aggregated data on the web interface.

Input

The two following inputs are needed:
  • The UniProt Knowledgebase (UniProtKB) Accession Code (AC) of the protein carrying your variants:

    TBvar3D requires the user to refer to a protein entry specified by the UniProtKB (UniProt Consortium) and to map their variants of interest to the corresponding UniProtKB protein sequence.

  • A list of variants

    The variants have to be entered one by one using the variant format described below. If variants from multiple proteins are to be analyzed, we ask the user to submit multiple projects, one for each protein.

The format of the variant is:

reference amino acid(s)position of first reference amino acid (1-based indexing, i.e. first character in sequence has position 1)alternative amino acid(s)
K123D

Input validation checks whether the reference amino acid(s) match the UniProtKB sequence at the specified position and parses the variant types.

Note: Positions of insertions are indicated by the previous amino acid. E.g. an insertion of an Alanine after position 543 would be written as F543FA.

Var3D input
Example of an input for Var3D with every variant type Var3D recognizes as an input.

Data Import

Using the UniProtKB AC as a reference, the pipeline will fetch the corresponding structure. If the UniProtKB AC is represented in the WHO mutation catalog, it will get the protein structure from a database of custom protein structures which were modelled manually in order to ensure that antibiotic drugs are present in the structures wherever plausible and to ensure a proper representation of the likely oligomeric state of the protein. An index of the structure database can be found here.

For every other target, TBvar3D utilizes the SWISS-MODEL Repository (Bienert et al.) as a source of experimental structures and homology models. The Repository provides up to date homology models for every protein in the Mycobacterium tuberculosis proteome. We additionally get AlphaFold2 models (Jumper et al.) which were calculated for the complete Mycobacterium tuberculosis proteome and are stored in the AlphaFold Protein Structure Database (Varadi et al.).

Data Annotations

After the collection and mapping of variants to their corresponding structures, various annotations are calculated.

Sequence Annotations

  • Shannon Entropy
    Entropy (Shannon) as a measure of evolutionary conservation. A multiple sequence alignment is generated by performing a single iteration JackHMMER search (Johnson et al.) on UniRef90 (Suzek et al.) using the input UniProtKB sequence as reference. The resulting entropy values are scaled to [0, 1] with low values hinting at evolutionary conserved residues.
  • ConSurf
    As opposed to Shannon entropy, ConSurf (Ashkenazy et al.) explicitly considers the evolutionary relationships of the found homologues. Estimates of evolutionary rates, i.e. conservation, can thus be expected more accurate and complement the simple information theoretic entropy analysis. Conservation is expressed as integer value in [1,9] with 9 indicating a high evolutionary conservation. TBvar3D uses the pipeline of the ConSurf-DB (Ben Chorin et al.) which has been kindly provided by the authors for local execution.
  • UniProtKB Annotations
    Protein site annotations from the UniProtKB. The following annotations are displayed in TBvar3D:
    • Active site
    • Binding site
    • Disulfide bond
    • DNA binding
    • Intramembrane
    • Modified residue
    • Site
    • Transmembrane
    • Zinc finger
  • Please consult the UniProtKB sequence annotation page for more information.
  • InterPro Annotations
    Functional and protein domain annotations from InterPro (Blum et al.).

Structure Annotations

  • Accessibility
    Per-residue solvent accessiblities calculated after Lee & Richards. TBvar3D uses an implementation in OpenStructure (Biasini et al.). The accessibility of each residue is scaled by the theoretical maximum accessibility of that particular residue resulting in an expected range of [0, 100].
  • Transmembrane prediction
    Residues which were predicted to be located in a cell membrane. An implicit solvation model implemented in OpenStructure (mol.alg.FindMembrane) estimates the optimal membrane location for each structure and identifies transmembrane structures based on energetic and geometric criteria. The original algorithm and the used energy function are described in Lomize et al..

Variant Annotations

  • PROVEAN
    The PROVEAN score (Choi et al.) is a mutation impact score which is based on a multiple sequence alignment of the input protein sequence against the non-redundant protein sequence database from August 2011. PROVEAN is a delta alignment score which measures how likely the mutated score is related to different homologues and functional proteins. If the introduced mutation reduces the similarity between the input sequence and many functional homologuous protein sequence, that mutation is assumed to be damaging. The PROVEAN score can be any rational number, in which a score of lower than -2.282 is considered to be damaging by the authors of the original study.
  • Chemical Distances
    Chemical distances refer to the changing chemical properties in single amino acid substitutions. We report four properties that are extracted from AAindex (Kawashima et al.):
    • Hydrophobicity: Hydrophobic parameter pi (Fauchere-Pliska, 1983) (AAindex ID FAUJ830101)
    • Weight: Molecular weight (Fasman, 1976) (AAindex ID FASG760101)
    • Isoelectric Point: Isoelectric point (Zimmerman et al., 1968) (AAindex ID ZIMJ680104)
    • Size: STERIMOL length of the side chain (Fauchere et al., 1988) (AAindex ID FAUJ880104)

Output

Var3D output

Variant Overview

Variants mapped to the current protein are displayed here. There are five categories in Var3D:
  • User Variants: User submitted variants for the current protein.
  • Resistance Variants: Variants annotated as resistant by the WHO mutation catalog. These variants are thought to have an impact on drug resistance.
  • Neutral Variants: Variants annotated as neutral by the WHO mutation catalog. These variants are thought to NOT have an impact on drug resistance.
  • Uncertain Variants Variants annotated as uncertain by the WHO mutation catalog. The role of these variants is still not determined, more data is needed.
Holding CTRL and scrolling up and down allows the user to zoom in and out the sequence space. Every elipsoid in the Variant Overview corresponds to a variant. Clicking on it will show the corresponding Variant Annotations and zoom in at the corresponding spot in the Structure View. Clicking of the name of a group will show all the variants in the group on the structure. While holding CTRL one can select a region of the sequence.

Sequence Annotations

All the annotations related to the sequence are shown here, this includes:
  • UniProtKB Annotations
  • InterPro Annotations
  • Shannon Entropy
  • ConSurf

Structure Switch

The bars in these region show all the structures available for this specific protein. By hovering over a bar, one can learn more information on the origin of the structure. Clicking on a bar switches the structure in the structure view. The bar indicates which part of the sequence is covered by a structure.

Variant Annotations

All annotations which are specific for a variant are displayed here. This includes chemical distances and the PROVEAN score.

Structure View

The structure view allows the user to explore the relationship between variant and structures. An important but easy to miss feature is the cogwheel button in the upper left corner which allows to color the current structure according to different features.

Drug View

Var3D output

For variants which are part of the WHO mutation catalog, a drug window will appear at the end of the feature display which contains the WHO assessment of the variant, the mechanism of action and description of the drug assoicated to this variant. If the lines around the boxes are full, a structure with the drug of interest exists and by clicking on the box one can switch to that structure. This will open a special Drug View, which only shows the environment around the currently selected drug. By adjusting the slider in the Structure View, one can adjust the size of the shown environment.

References

  • UniProtKB
    UniProt Consortium (2021). UniProt: the universal protein knowledgebase in 2021. Nucleic Acid Res.
  • SWISS-MODEL Repository
    Bienert, S., Waterhouse, A., de Beer, T.A.P., Tauriello, G., Studer, G., Bordoli, L., Schwede, T. (2017). The SWISS-MODEL Repository - new features and functionality. Nucleic Acid Res.
  • AlphaFold
    Jumper et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature
  • AlphaFold Database
    Varadi et al. (2021). AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res.
  • Shannon Entropy
    Shannon, C.E. (1948). A mathematical theory of communication. The Bell System Technical Journal
  • JackHMMER
    Johnson, L.S., Eddy S.R., Lomize M.A., Portugaly E. (2010). Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics
  • UniRef
    Suzek B.E., Wang Y., Huang H., McGarvey P., Wu C.H., UniProt Consortium (2015). UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics
  • ConSurf
    Ashkenazy H., Abadi S., Martz E., Chay O., Mayrose I., Pupko T., Ben-Tal N., (2016). ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res.
  • ConSurf-DB
    Ben Chorin A., Masrati G., Kessel A., Narunsky A., Sprinzak J., Lahav S., Ashkenazy H. and Ben-Tal N. (2020). ConSurf-DB: An accessible repository for the evolutionary conservation patterns of the majority of PDB proteins. Protein Science
  • InterPro
    Blum M. et al., (2021). The InterPro protein families and domains database: 20 years on. Nucleic Acid Res.
  • Solvent Accessibility
    Lee B., Richards F.M. (1971). The interpretation of protein structures: Estimation of static accessibility. J Mol Biol.
  • OpenStructure
    Biasini M., Schmidt T., Bienert S., Mariani V., Studer G., Haas J., Johner N., Schenk A.D., Philippsen A., Schwede T. (2013). OpenStructure: an integrated software framework for computational structural biology. Acta Crystallogr D Biol Crystallogr.
  • Membrane Prediction
    Lomize, A.L., Pogozheva I.D., Lomize M.A., Mosberg H.I. (2006). Positioning of proteins in membranes: A computational approach. Protein Sci.
  • PROVEAN
    Choi, Y., Sims G.E., Murphy S., Miller J.R., Chan A.P. (2012). Predicting the functional effect of amino acid substitutions and indels. PLoS One
  • AAindex
    Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M. (2008). AAindex: amino acid index database, progress report 2008. Nucleic Acids Res.