The TBvar3D web service enables the user to analyze protein variants in their structural context. It automatically maps the variant to a suitable protein structure and calculates conservation scores, mutational impact, chemical difference of the mutations and surface accessiblity of the mutation site. This information is displayed in the results page and is further integrated with functional annotations of the protein sequence and (if applicable) the variants of the antibiotic resistance catalog in Mycobacterium tuberculosis from the WHO.
The Var3D process compromises of 4 steps: (i) Processing and validation of the user input (ii) Aggregation of structural and variant data (iii) annotation of protein sequences, variants and protein structures (iv) display of the aggregated data on the web interface.
InputThe two following inputs are needed:
- The UniProt Knowledgebase (UniProtKB) Accession Code (AC) of the protein carrying your variants:
A list of variants
The variants have to be entered one by one using the variant format described below. If variants from multiple proteins are to be analyzed, we ask the user to submit multiple projects, one for each protein.
|reference amino acid(s)||position of first reference amino acid (1-based indexing, i.e. first character in sequence has position 1)||alternative amino acid(s)|
Input validation checks whether the reference amino acid(s) match the UniProtKB sequence at the specified position and parses the variant types.
Note: Positions of insertions are indicated by the previous amino acid. E.g. an insertion of an Alanine after position 543 would be written as F543FA.
Using the UniProtKB AC as a reference, the pipeline will fetch the corresponding structure. If the UniProtKB AC is represented in the WHO mutation catalog, it will get the protein structure from a database of custom protein structures which were modelled manually in order to ensure that antibiotic drugs are present in the structures wherever plausible and to ensure a proper representation of the likely oligomeric state of the protein. An index of the structure database can be found here.
For every other target, TBvar3D utilizes the SWISS-MODEL Repository (Bienert et al.) as a source of experimental structures and homology models. The Repository provides up to date homology models for every protein in the Mycobacterium tuberculosis proteome. We additionally get AlphaFold2 models (Jumper et al.) which were calculated for the complete Mycobacterium tuberculosis proteome and are stored in the AlphaFold Protein Structure Database (Varadi et al.).
Data AnnotationsAfter the collection and mapping of variants to their corresponding structures, various annotations are calculated.
- Shannon Entropy
Entropy (Shannon) as a measure of evolutionary conservation. A multiple sequence alignment is generated by performing a single iteration JackHMMER search (Johnson et al.) on UniRef90 (Suzek et al.) using the input UniProtKB sequence as reference. The resulting entropy values are scaled to [0, 1] with low values hinting at evolutionary conserved residues.
As opposed to Shannon entropy, ConSurf (Ashkenazy et al.) explicitly considers the evolutionary relationships of the found homologues. Estimates of evolutionary rates, i.e. conservation, can thus be expected more accurate and complement the simple information theoretic entropy analysis. Conservation is expressed as integer value in [1,9] with 9 indicating a high evolutionary conservation. TBvar3D uses the pipeline of the ConSurf-DB (Ben Chorin et al.) which has been kindly provided by the authors for local execution.
- UniProtKB Annotations
Protein site annotations from the UniProtKB. The following annotations are displayed in TBvar3D:
Please consult the UniProtKB
sequence annotation page
for more information.
- Active site
- Binding site
- Disulfide bond
- DNA binding
- Modified residue
- Zinc finger
- InterPro Annotations
Functional and protein domain annotations from InterPro (Blum et al.).
Per-residue solvent accessiblities calculated after Lee & Richards. TBvar3D uses an implementation in OpenStructure (Biasini et al.). The accessibility of each residue is scaled by the theoretical maximum accessibility of that particular residue resulting in an expected range of [0, 100].
- Transmembrane prediction
Residues which were predicted to be located in a cell membrane. An implicit solvation model implemented in OpenStructure (mol.alg.FindMembrane) estimates the optimal membrane location for each structure and identifies transmembrane structures based on energetic and geometric criteria. The original algorithm and the used energy function are described in Lomize et al..
The PROVEAN score (Choi et al.) is a mutation impact score which is based on a multiple sequence alignment of the input protein sequence against the non-redundant protein sequence database from August 2011. PROVEAN is a delta alignment score which measures how likely the mutated score is related to different homologues and functional proteins. If the introduced mutation reduces the similarity between the input sequence and many functional homologuous protein sequence, that mutation is assumed to be damaging. The PROVEAN score can be any rational number, in which a score of lower than -2.282 is considered to be damaging by the authors of the original study.
- Chemical Distances
Chemical distances refer to the changing chemical properties in single amino acid substitutions. We report four properties that are extracted from AAindex (Kawashima et al.):
- Hydrophobicity: Hydrophobic parameter pi (Fauchere-Pliska, 1983) (AAindex ID FAUJ830101)
- Weight: Molecular weight (Fasman, 1976) (AAindex ID FASG760101)
- Isoelectric Point: Isoelectric point (Zimmerman et al., 1968) (AAindex ID ZIMJ680104)
- Size: STERIMOL length of the side chain (Fauchere et al., 1988) (AAindex ID FAUJ880104)
Variant OverviewVariants mapped to the current protein are displayed here. There are five categories in Var3D:
- User Variants: User submitted variants for the current protein.
- Resistance Variants: Variants annotated as resistant by the WHO mutation catalog. These variants are thought to have an impact on drug resistance.
- Neutral Variants: Variants annotated as neutral by the WHO mutation catalog. These variants are thought to NOT have an impact on drug resistance.
- Uncertain Variants Variants annotated as uncertain by the WHO mutation catalog. The role of these variants is still not determined, more data is needed.
Sequence AnnotationsAll the annotations related to the sequence are shown here, this includes:
- UniProtKB Annotations
- InterPro Annotations
- Shannon Entropy
Structure SwitchThe bars in these region show all the structures available for this specific protein. By hovering over a bar, one can learn more information on the origin of the structure. Clicking on a bar switches the structure in the structure view. The bar indicates which part of the sequence is covered by a structure.
Variant AnnotationsAll annotations which are specific for a variant are displayed here. This includes chemical distances and the PROVEAN score.
Structure ViewThe structure view allows the user to explore the relationship between variant and structures. An important but easy to miss feature is the cogwheel button in the upper left corner which allows to color the current structure according to different features.
For variants which are part of the WHO mutation catalog, a drug window will appear at the end of the feature display which contains the WHO assessment of the variant, the mechanism of action and description of the drug assoicated to this variant. If the lines around the boxes are full, a structure with the drug of interest exists and by clicking on the box one can switch to that structure. This will open a special Drug View, which only shows the environment around the currently selected drug. By adjusting the slider in the Structure View, one can adjust the size of the shown environment.
UniProt Consortium (2021). UniProt: the universal protein knowledgebase in 2021. Nucleic Acid Res.
Bienert, S., Waterhouse, A., de Beer, T.A.P., Tauriello, G., Studer, G., Bordoli, L., Schwede, T. (2017). The SWISS-MODEL Repository - new features and functionality. Nucleic Acid Res.
Jumper et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature
Varadi et al. (2021). AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res.
Shannon, C.E. (1948). A mathematical theory of communication. The Bell System Technical Journal
Johnson, L.S., Eddy S.R., Lomize M.A., Portugaly E. (2010). Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics
Suzek B.E., Wang Y., Huang H., McGarvey P., Wu C.H., UniProt Consortium (2015). UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics
Ashkenazy H., Abadi S., Martz E., Chay O., Mayrose I., Pupko T., Ben-Tal N., (2016). ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res.
Ben Chorin A., Masrati G., Kessel A., Narunsky A., Sprinzak J., Lahav S., Ashkenazy H. and Ben-Tal N. (2020). ConSurf-DB: An accessible repository for the evolutionary conservation patterns of the majority of PDB proteins. Protein Science
Blum M. et al., (2021). The InterPro protein families and domains database: 20 years on. Nucleic Acid Res.
Lee B., Richards F.M. (1971). The interpretation of protein structures: Estimation of static accessibility. J Mol Biol.
Biasini M., Schmidt T., Bienert S., Mariani V., Studer G., Haas J., Johner N., Schenk A.D., Philippsen A., Schwede T. (2013). OpenStructure: an integrated software framework for computational structural biology. Acta Crystallogr D Biol Crystallogr.
Lomize, A.L., Pogozheva I.D., Lomize M.A., Mosberg H.I. (2006). Positioning of proteins in membranes: A computational approach. Protein Sci.
Choi, Y., Sims G.E., Murphy S., Miller J.R., Chan A.P. (2012). Predicting the functional effect of amino acid substitutions and indels. PLoS One
Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M. (2008). AAindex: amino acid index database, progress report 2008. Nucleic Acids Res.