Introduction

The TBvar3D web service enables the user to analyze protein variants in their structural context. It automatically maps the variant to a suitable protein structure and calculates conservation scores, mutational impact, chemical difference of the mutations and surface accessiblity of the mutation site. This information is displayed in the results page and is further integrated with functional annotations of the protein sequence and (if applicable) the variants of the antibiotic resistance catalog in Mycobacterium tuberculosis from the WHO.

The Var3D process compromises of 4 steps: (i) Processing and validation of the user input (ii) Aggregation of structural and variant data (iii) annotation of protein sequences, variants and protein structures (iv) display of the aggregated data on the web interface.

Input

The two following inputs are needed:

The UniProt Knowledgebase (UniProtKB) Accession Code (AC) of the protein carrying your variants:
TBvar3D requires the user to refer to a protein entry specified by the UniProtKB (UniProt Consortium) and to map their variants of interest to the corresponding UniProtKB protein sequence.
A list of variants
The variants have to be entered one by one using the variant format described below. If variants from multiple proteins are to be analyzed, we ask the user to submit multiple projects, one for each protein.

The format of the variant is:

reference amino acid(s)	position of first reference amino acid (1-based indexing, i.e. first character in sequence has position 1)	alternative amino acid(s)
K	123	D

Input validation checks whether the reference amino acid(s) match the UniProtKB sequence at the specified position and parses the variant types.

Note: Positions of insertions are indicated by the previous amino acid. E.g. an insertion of an Alanine after position 543 would be written as F543FA.

Var3D input — *Example of an input for Var3D with every variant type Var3D recognizes as an input.*

Data Import

Using the UniProtKB AC as a reference, the pipeline will fetch the corresponding structure. If the UniProtKB AC is represented in the WHO mutation catalog, it will get the protein structure from a database of custom protein structures which were modelled manually in order to ensure that antibiotic drugs are present in the structures wherever plausible and to ensure a proper representation of the likely oligomeric state of the protein. An index of the structure database can be found here.

For every other target, TBvar3D utilizes the SWISS-MODEL Repository (Bienert et al.) as a source of experimental structures and homology models. The Repository provides up to date homology models for every protein in the Mycobacterium tuberculosis proteome. We additionally get AlphaFold2 models (Jumper et al.) which were calculated for the complete Mycobacterium tuberculosis proteome and are stored in the AlphaFold Protein Structure Database (Varadi et al.).

Data Annotations

After the collection and mapping of variants to their corresponding structures, various annotations are calculated.

Sequence Annotations

Shannon Entropy
Entropy (Shannon) as a measure of evolutionary conservation. A multiple sequence alignment is generated by performing a single iteration JackHMMER search (Johnson et al.) on UniRef90 (Suzek et al.) using the input UniProtKB sequence as reference. The resulting entropy values are scaled to [0, 1] with low values hinting at evolutionary conserved residues.
ConSurf
As opposed to Shannon entropy, ConSurf (Ashkenazy et al.) explicitly considers the evolutionary relationships of the found homologues. Estimates of evolutionary rates, i.e. conservation, can thus be expected more accurate and complement the simple information theoretic entropy analysis. Conservation is expressed as integer value in [1,9] with 9 indicating a high evolutionary conservation. TBvar3D uses the pipeline of the ConSurf-DB (Ben Chorin et al.) which has been kindly provided by the authors for local execution.
UniProtKB Annotations
Protein site annotations from the UniProtKB. The following annotations are displayed in TBvar3D:
- Active site
- Binding site
- Disulfide bond
- DNA binding
- Intramembrane
- Modified residue
- Site
- Transmembrane
- Zinc finger

sequence annotation page

InterPro Annotations
Functional and protein domain annotations from InterPro (Blum et al.).

Structure Annotations

Accessibility
Per-residue solvent accessiblities calculated after Lee & Richards. TBvar3D uses an implementation in OpenStructure (Biasini et al.). The accessibility of each residue is scaled by the theoretical maximum accessibility of that particular residue resulting in an expected range of [0, 100].

Transmembrane prediction
Residues which were predicted to be located in a cell membrane. An implicit solvation model implemented in OpenStructure (mol.alg.FindMembrane) estimates the optimal membrane location for each structure and identifies transmembrane structures based on energetic and geometric criteria. The original algorithm and the used energy function are described in Lomize et al..

Variant Annotations

PROVEAN
The PROVEAN score (Choi et al.) is a mutation impact score which is based on a multiple sequence alignment of the input protein sequence against the non-redundant protein sequence database from August 2011. PROVEAN is a delta alignment score which measures how likely the mutated score is related to different homologues and functional proteins. If the introduced mutation reduces the similarity between the input sequence and many functional homologuous protein sequence, that mutation is assumed to be damaging. The PROVEAN score can be any rational number, in which a score of lower than -2.282 is considered to be damaging by the authors of the original study.
Chemical Distances
Chemical distances refer to the changing chemical properties in single amino acid substitutions. We report four properties that are extracted from AAindex (Kawashima et al.):
- Hydrophobicity: Hydrophobic parameter pi (Fauchere-Pliska, 1983) (AAindex ID FAUJ830101)
- Weight: Molecular weight (Fasman, 1976) (AAindex ID FASG760101)
- Isoelectric Point: Isoelectric point (Zimmerman et al., 1968) (AAindex ID ZIMJ680104)
- Size: STERIMOL length of the side chain (Fauchere et al., 1988) (AAindex ID FAUJ880104)

Output

Variant Overview

Variants mapped to the current protein are displayed here. There are five categories in Var3D:

User Variants: User submitted variants for the current protein.
Resistance Variants: Variants annotated as resistant by the WHO mutation catalog. These variants are thought to have an impact on drug resistance.
Neutral Variants: Variants annotated as neutral by the WHO mutation catalog. These variants are thought to NOT have an impact on drug resistance.
Uncertain Variants Variants annotated as uncertain by the WHO mutation catalog. The role of these variants is still not determined, more data is needed.

Holding CTRL and scrolling up and down allows the user to zoom in and out the sequence space. Every elipsoid in the Variant Overview corresponds to a variant. Clicking on it will show the corresponding Variant Annotations and zoom in at the corresponding spot in the Structure View. Clicking of the name of a group will show all the variants in the group on the structure. While holding CTRL one can select a region of the sequence.

Sequence Annotations

All the annotations related to the sequence are shown here, this includes:

UniProtKB Annotations
InterPro Annotations
Shannon Entropy
ConSurf

Structure Switch

The bars in these region show all the structures available for this specific protein. By hovering over a bar, one can learn more information on the origin of the structure. Clicking on a bar switches the structure in the structure view. The bar indicates which part of the sequence is covered by a structure.

Variant Annotations

All annotations which are specific for a variant are displayed here. This includes chemical distances and the PROVEAN score.

Structure View

The structure view allows the user to explore the relationship between variant and structures. An important but easy to miss feature is the cogwheel button in the upper left corner which allows to color the current structure according to different features.

Drug View

For variants which are part of the WHO mutation catalog, a drug window will appear at the end of the feature display which contains the WHO assessment of the variant, the mechanism of action and description of the drug assoicated to this variant. If the lines around the boxes are full, a structure with the drug of interest exists and by clicking on the box one can switch to that structure. This will open a special Drug View, which only shows the environment around the currently selected drug. By adjusting the slider in the Structure View, one can adjust the size of the shown environment.

UniProtKB
UniProt Consortium. UniProt: the universal protein knowledgebase in 2021.
Nucleic Acids Res. 49(D1):D480-D489. (2021) 33237286 10.1093/nar/gkaa1100
SWISS-MODEL Repository
Bienert S, Waterhouse A, de Beer TAP, Tauriello G, Studer G, Bordoli L, Schwede T
The SWISS-MODEL Repository - new features and functionality.
Nucleic Acids Res 45, D313-D319. (2017) 27899672 10.1093/nar/gkw1132
AlphaFold
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D
Highly accurate protein structure prediction with AlphaFold.
Nature. Aug;596(7873):583-589. (2021) 34265844 10.1038/s41586-021-03819-2
AlphaFold Protein Structure Database
Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, Yuan D, Stroe O, Wood G, Laydon A, Žídek A, Green T, Tunyasuvunakool K, Petersen S, Jumper J, Clancy E, Green R, Vora A, Lutfi M, Figurnov M, Cowie A, Hobbs N, Kohli P, Kleywegt G, Birney E, Hassabis D, Velankar S
AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models.
Nucleic Acids Res., 50, D439–D444. (2022) 34791371 10.1093/nar/gkab1061
Shannon Entropy
Shannon CE
A mathematical theory of communication.
The Bell System Technical Journal, vol. 27, no. 3, pp. 379-423. (1948) 10.1002/j.1538-7305.1948.tb01338.x
JackHMMER
Johnson LS, Eddy SR, Portugaly E
Hidden Markov model speed heuristic and iterative HMM search procedure.
BMC Bioinformatics. 11:431. (2010) 20718988 10.1186/1471-2105-11-431
UniRef
Suzek BE, Wang Y, Huang H, McGarvey PB, Wu CH; UniProt Consortium
UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches.
Bioinformatics. 31(6):926-32. (2014) 25398609 10.1093/bioinformatics/btu739
ConSurf
Ashkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, Ben-Tal N
ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules.
Nucleic Acids Res. 44(W1):W344-50. (2016) 27166375 10.1093/nar/gkw408
ConSurf-DB
Ben Chorin A, Masrati G, Kessel A, Narunsky A, Sprinzak J, Lahav S, Ashkenazy H, Ben-Tal N
ConSurf-DB: An accessible repository for the evolutionary conservation patterns of the majority of PDB proteins.
Protein Sci. 29(1):258-267. (2020) 31702846 10.1002/pro.3779
InterPro
Blum M, Chang HY, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, Nuka G, Paysan-Lafosse T, Qureshi M, Raj S, Richardson L, Salazar GA, Williams L, Bork P, Bridge A, Gough J, Haft DH, Letunic I, Marchler-Bauer A, Mi H, Natale DA, Necci M, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A, Finn RD
The InterPro protein families and domains database: 20 years on.
Nucleic Acids Res. 49(D1):D344-D354. (2021) 33156333 10.1093/nar/gkaa977
Solvent Accessibility
Lee B, Richards FM
The interpretation of protein structures: estimation of static accessibility.
J Mol Biol. 55(3):379-400. (1971) 5551392 10.1016/0022-2836(71)90324-x
OpenStructure (OST)
Biasini M, Schmidt T, Bienert S, Mariani V, Studer G, Haas J, Johner N, Schenk AD, Philippsen A, Schwede T
OpenStructure: an integrated software framework for computational structural biology.
Acta Cryst 2013. (2013) 23633579 10.1107/S0907444913007051
Membrane Prediction
Lomize AL, Pogozheva ID, Lomize MA, Mosberg HI
Positioning of proteins in membranes: A computational approach.
Protein Sci. (2006) 16731967 10.1110/ps.062126106
PROVEAN
Choi Y, Sims GE, Murphy S, Miller JR, Chan AP
Predicting the functional effect of amino acid substitutions and indels.
PLoS One. 7(10):e46688. (2012) 23056405 10.1371/journal.pone.0046688
AAindex
Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M
AAindex: amino acid index database, progress report 2008.
Nucleic Acids Res. 36(Database issue):D202-5. (2008) 17998252 10.1093/nar/gkm998

TBvar3D M. tuberculosis resistance variants mapped on protein structures

Introduction

Input

Data Import

Data Annotations

Sequence Annotations

Structure Annotations

Variant Annotations

Output

Variant Overview

Sequence Annotations

Structure Switch

Variant Annotations

Structure View

Drug View