Introduction
The TBvar3D web service enables the user to analyze protein variants in their structural context. It automatically maps the variant to a suitable protein structure and calculates conservation scores, mutational impact, chemical difference of the mutations and surface accessiblity of the mutation site. This information is displayed in the results page and is further integrated with functional annotations of the protein sequence and (if applicable) the variants of the antibiotic resistance catalog in Mycobacterium tuberculosis from the WHO.
The Var3D process compromises of 4 steps: (i) Processing and validation of the user input (ii) Aggregation of structural and variant data (iii) annotation of protein sequences, variants and protein structures (iv) display of the aggregated data on the web interface.
Input
The two following inputs are needed:-
The UniProt Knowledgebase (UniProtKB) Accession Code (AC) of
the protein carrying your variants:
TBvar3D requires the user to refer to a protein entry specified by the UniProtKB (UniProt Consortium) and to map their variants of interest to the corresponding UniProtKB protein sequence.
-
A list of variants
The variants have to be entered one by one using the variant format described below. If variants from multiple proteins are to be analyzed, we ask the user to submit multiple projects, one for each protein.
reference amino acid(s) | position of first reference amino acid (1-based indexing, i.e. first character in sequence has position 1) | alternative amino acid(s) |
K | 123 | D |
Input validation checks whether the reference amino acid(s) match the UniProtKB sequence at the specified position and parses the variant types.
Note: Positions of insertions are indicated by the previous amino acid. E.g. an insertion of an Alanine after position 543 would be written as F543FA.
Data Import
Using the UniProtKB AC as a reference, the pipeline will fetch the corresponding structure. If the UniProtKB AC is represented in the WHO mutation catalog, it will get the protein structure from a database of custom protein structures which were modelled manually in order to ensure that antibiotic drugs are present in the structures wherever plausible and to ensure a proper representation of the likely oligomeric state of the protein. An index of the structure database can be found here.
For every other target, TBvar3D utilizes the SWISS-MODEL Repository (Bienert et al.) as a source of experimental structures and homology models. The Repository provides up to date homology models for every protein in the Mycobacterium tuberculosis proteome. We additionally get AlphaFold2 models (Jumper et al.) which were calculated for the complete Mycobacterium tuberculosis proteome and are stored in the AlphaFold Protein Structure Database (Varadi et al.).
Data Annotations
After the collection and mapping of variants to their corresponding structures, various annotations are calculated.Sequence Annotations
- Shannon Entropy
Entropy (Shannon) as a measure of evolutionary conservation. A multiple sequence alignment is generated by performing a single iteration JackHMMER search (Johnson et al.) on UniRef90 (Suzek et al.) using the input UniProtKB sequence as reference. The resulting entropy values are scaled to [0, 1] with low values hinting at evolutionary conserved residues. - ConSurf
As opposed to Shannon entropy, ConSurf (Ashkenazy et al.) explicitly considers the evolutionary relationships of the found homologues. Estimates of evolutionary rates, i.e. conservation, can thus be expected more accurate and complement the simple information theoretic entropy analysis. Conservation is expressed as integer value in [1,9] with 9 indicating a high evolutionary conservation. TBvar3D uses the pipeline of the ConSurf-DB (Ben Chorin et al.) which has been kindly provided by the authors for local execution. - UniProtKB Annotations
Protein site annotations from the UniProtKB. The following annotations are displayed in TBvar3D:- Active site
- Binding site
- Disulfide bond
- DNA binding
- Intramembrane
- Modified residue
- Site
- Transmembrane
- Zinc finger
Please consult the UniProtKB
sequence annotation page
for more information.
- InterPro Annotations
Functional and protein domain annotations from InterPro (Blum et al.).
Structure Annotations
- Accessibility
Per-residue solvent accessiblities calculated after Lee & Richards. TBvar3D uses an implementation in OpenStructure (Biasini et al.). The accessibility of each residue is scaled by the theoretical maximum accessibility of that particular residue resulting in an expected range of [0, 100].
- Transmembrane prediction
Residues which were predicted to be located in a cell membrane. An implicit solvation model implemented in OpenStructure (mol.alg.FindMembrane) estimates the optimal membrane location for each structure and identifies transmembrane structures based on energetic and geometric criteria. The original algorithm and the used energy function are described in Lomize et al..
Variant Annotations
- PROVEAN
The PROVEAN score (Choi et al.) is a mutation impact score which is based on a multiple sequence alignment of the input protein sequence against the non-redundant protein sequence database from August 2011. PROVEAN is a delta alignment score which measures how likely the mutated score is related to different homologues and functional proteins. If the introduced mutation reduces the similarity between the input sequence and many functional homologuous protein sequence, that mutation is assumed to be damaging. The PROVEAN score can be any rational number, in which a score of lower than -2.282 is considered to be damaging by the authors of the original study. - Chemical Distances
Chemical distances refer to the changing chemical properties in single amino acid substitutions. We report four properties that are extracted from AAindex (Kawashima et al.):- Hydrophobicity: Hydrophobic parameter pi (Fauchere-Pliska, 1983) (AAindex ID FAUJ830101)
- Weight: Molecular weight (Fasman, 1976) (AAindex ID FASG760101)
- Isoelectric Point: Isoelectric point (Zimmerman et al., 1968) (AAindex ID ZIMJ680104)
- Size: STERIMOL length of the side chain (Fauchere et al., 1988) (AAindex ID FAUJ880104)
Output
Variant Overview
Variants mapped to the current protein are displayed here. There are five categories in Var3D:- User Variants: User submitted variants for the current protein.
- Resistance Variants: Variants annotated as resistant by the WHO mutation catalog. These variants are thought to have an impact on drug resistance.
- Neutral Variants: Variants annotated as neutral by the WHO mutation catalog. These variants are thought to NOT have an impact on drug resistance.
- Uncertain Variants Variants annotated as uncertain by the WHO mutation catalog. The role of these variants is still not determined, more data is needed.
Sequence Annotations
All the annotations related to the sequence are shown here, this includes:- UniProtKB Annotations
- InterPro Annotations
- Shannon Entropy
- ConSurf
Structure Switch
The bars in these region show all the structures available for this specific protein. By hovering over a bar, one can learn more information on the origin of the structure. Clicking on a bar switches the structure in the structure view. The bar indicates which part of the sequence is covered by a structure.Variant Annotations
All annotations which are specific for a variant are displayed here. This includes chemical distances and the PROVEAN score.Structure View
The structure view allows the user to explore the relationship between variant and structures. An important but easy to miss feature is the cogwheel button in the upper left corner which allows to color the current structure according to different features.Drug View
For variants which are part of the WHO mutation catalog, a drug window will appear at the end of the feature display which contains the WHO assessment of the variant, the mechanism of action and description of the drug assoicated to this variant. If the lines around the boxes are full, a structure with the drug of interest exists and by clicking on the box one can switch to that structure. This will open a special Drug View, which only shows the environment around the currently selected drug. By adjusting the slider in the Structure View, one can adjust the size of the shown environment.
- UniProtKBUniProt Consortium. UniProt: the universal protein knowledgebase in 2021.Nucleic Acids Res. 49(D1):D480-D489. (2021) 3323728610.1093/nar/gkaa1100
- SWISS-MODEL RepositoryBienert S, Waterhouse A, de Beer TAP, Tauriello G, Studer G, Bordoli L, Schwede TThe SWISS-MODEL Repository - new features and functionality.Nucleic Acids Res 45, D313-D319. (2017) 2789967210.1093/nar/gkw1132
- AlphaFoldJumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis DHighly accurate protein structure prediction with AlphaFold.Nature. Aug;596(7873):583-589. (2021) 3426584410.1038/s41586-021-03819-2
- AlphaFold Protein Structure DatabaseVaradi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, Yuan D, Stroe O, Wood G, Laydon A, Žídek A, Green T, Tunyasuvunakool K, Petersen S, Jumper J, Clancy E, Green R, Vora A, Lutfi M, Figurnov M, Cowie A, Hobbs N, Kohli P, Kleywegt G, Birney E, Hassabis D, Velankar SAlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models.Nucleic Acids Res., 50, D439–D444. (2022) 3479137110.1093/nar/gkab1061
- Shannon EntropyShannon CEA mathematical theory of communication.The Bell System Technical Journal, vol. 27, no. 3, pp. 379-423. (1948) 10.1002/j.1538-7305.1948.tb01338.x
- JackHMMERJohnson LS, Eddy SR, Portugaly EHidden Markov model speed heuristic and iterative HMM search procedure.BMC Bioinformatics. 11:431. (2010) 2071898810.1186/1471-2105-11-431
- UniRefSuzek BE, Wang Y, Huang H, McGarvey PB, Wu CH; UniProt ConsortiumUniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches.Bioinformatics. 31(6):926-32. (2014) 2539860910.1093/bioinformatics/btu739
- ConSurfAshkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, Ben-Tal NConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules.Nucleic Acids Res. 44(W1):W344-50. (2016) 2716637510.1093/nar/gkw408
- ConSurf-DBBen Chorin A, Masrati G, Kessel A, Narunsky A, Sprinzak J, Lahav S, Ashkenazy H, Ben-Tal NConSurf-DB: An accessible repository for the evolutionary conservation patterns of the majority of PDB proteins.Protein Sci. 29(1):258-267. (2020) 3170284610.1002/pro.3779
- InterProBlum M, Chang HY, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, Nuka G, Paysan-Lafosse T, Qureshi M, Raj S, Richardson L, Salazar GA, Williams L, Bork P, Bridge A, Gough J, Haft DH, Letunic I, Marchler-Bauer A, Mi H, Natale DA, Necci M, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A, Finn RDThe InterPro protein families and domains database: 20 years on.Nucleic Acids Res. 49(D1):D344-D354. (2021) 3315633310.1093/nar/gkaa977
- Solvent AccessibilityLee B, Richards FMThe interpretation of protein structures: estimation of static accessibility.J Mol Biol. 55(3):379-400. (1971) 555139210.1016/0022-2836(71)90324-x
- OpenStructure (OST)Biasini M, Schmidt T, Bienert S, Mariani V, Studer G, Haas J, Johner N, Schenk AD, Philippsen A, Schwede TOpenStructure: an integrated software framework for computational structural biology.Acta Cryst 2013. (2013) 2363357910.1107/S0907444913007051
- Membrane PredictionLomize AL, Pogozheva ID, Lomize MA, Mosberg HIPositioning of proteins in membranes: A computational approach.Protein Sci. (2006) 1673196710.1110/ps.062126106
- PROVEANChoi Y, Sims GE, Murphy S, Miller JR, Chan APPredicting the functional effect of amino acid substitutions and indels.PLoS One. 7(10):e46688. (2012) 2305640510.1371/journal.pone.0046688
- AAindexKawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa MAAindex: amino acid index database, progress report 2008.Nucleic Acids Res. 36(Database issue):D202-5. (2008) 1799825210.1093/nar/gkm998