4 Tertiary Protein Structure and Folds
4.1 Introduction
Chapters 1 and 2 introduced alpha-helices and beta-sheets (Secondary Structure), and some common "motifs" composed of 2 or 3 of these elements (Super-secondary Structure). Tertiary structure describes the folding of the polypeptide chain to assemble the different secondary structure elements in a particular arrangement. As helices and sheets are units of secondary structure, so the domain is the unit of tertiary structure. In multi-domain proteins, tertiary structure includes the arrangement of domains relative to each other as well as that of the chain within each domain.
There is a blurred distinction between "super-secondary structure" and "tertiary structure". The introduction of the term "super-secondary structure" was necessary when it became clear that certain arrangements of two or three secondary structures are present in many different protein structures, even with completely different sequences.
Note that some proteins do not consist of an assembly of these super-secondary motifs. For example, proteins of the globin family consist of eight alpha-helices in contact, but the helices do not pack against other helices which are adjacent in the sequence, with the exception of the final two, which form an antiparallel helix-turn-helix motif.
Although the term "motif" is often used to describe super-secondary structures (e.g. Branden and Tooze, 1991), it may also be used to describe a consensus sequence of amino acids identified in a number of different proteins, rather than a repeated 3D conformation. Such a consensus in primary structure generally implies a similarity in tertiary structure. But bear in mind that there are very many protein sequences of which the 3D structures are not known for certain, so that the term "motif" strictly applies to primary rather than super-secondary or tertiary structure in these cases.
Below is a non exhaustive list of topologies.
4.2 All-alpha topologies
4.2.1 The lone helix
There are a number of examples of small proteins (or peptides) which consist of little more than a single helix. A striking example is alamethicin (view on PDB), a transmembrane voltage gated ion channel, acting as a peptide antibiotic.
4.2.2 The helix-turn-helix motif
The simplest packing arrangement of a domain of two helices is for them to lie antiparallel, connected by a short loop. This constitutes the structure of the small (63 residue) RNA-binding protein Rom , which is found in certain plasmids (small circular molecules of double-stranded DNA occurring in bacteria and yeast) and involved in their replication. There is a slight twist in the arrangement as shown.
4.2.3 The four-helix bundle
The four-helix bundle is found in a number of different proteins. In many cases the helices are part of a single polypeptide chain, connected to each other by three loops. However, the Rom molecule is in fact a dimer of two of the two-helix units shown above.
In four-helix-bundle proteins the interfaces between the helices consist mostly of hydrophobic residues while polar side chains on the exposed surfaces interact with the aqueous environment, as indicated below:
The central helices of the photosynthetic reaction centre in fact are arranged similar to the four-helix bundle.
Other examples exhibit a much more open packing arrangement, as in the steroid-binding proteins uteroglobin, and Clara cell 17kDa protein.
4.2.3.1 Myohemerythrin
The four helices may be arranged in a simple up-and-down topology, as indicated. A good example is myohemerythrin.
4.2.3.2 Ferritin
A more complex arrangement, such as ferritin is possible:
4.2.3.3 Cytokines
A number of cytokines consist of four alpha-helices such as Interleukin-2 and human Growth Hormone.
4.2.4 alpha domains which bind DNA
Transcription factors are proteins which bind to control regions of DNA. The RNA-binding two-helix protein Rom has already been mentioned. A three-helix bundle forms the basis of a DNA-binding domain which occurs in a number of proteins- for example homeodomain proteins. Examine the crystal structure of engrailed homeodomain binding to DNA.
4.2.5 Globins
The globin fold usually consists of eight alpha-helices. The two helices at the end of the chain are antiparallel, forming alpha helix-turn-helix motif, but the remainder of the fold does not include any characterised super-secondary structures. These helices pack against each other with larger angles, around 50°, between them than what occurs between antiparallel helices (approximately 20°). See the section below on helix-helix packing. Jane Richardson (1981) describes the globin fold as a "Greek key helix bundle", due to the topological similarity with the Greek key arrangement of antiparallel beta-sheets (see section 4.3 on all-beta topologies).
4.2.6 Helix-helix packing
When alpha-helices pack against each other, the side-chains in their interface are buried. The two interface areas should have complementary surfaces. The surface of an alpha-helix can be thought of as consisting of grooves and ridges, like a screw thread: for instance, the side chains of every 4th residue form a ridge (because there are 3.6 residues per turn). The direction of this ridge is 26° from the direction of the helix axis. Therefore if 2 helices pack such that such a ridge from each fits into the other's groove, the expected angle between the two is 52°. In fact, in the distribution of this angle between packed alpha-helices, there is a sharp peak at 50°. Besides the type of ridge described, ridges can be formed by other stacking patterns of residues, such as every 3rd residue, or indeed every residue. Which ridges are used for packing depends on the size and conformations of the side chains at these relative positions. The "i+4" ridge is believed to be the most common because residues at every 4th position have side-chains which are more closely aligned than in "i+3" or "i+1" ridges as indicated below.
Two other types of packing do occur, however: between an "i+4" ridge and an "i+3" ridge (there is an angle of 23° between the 2 helix axes) and between an "i+4" and an "i+1" ridge (the helices are 105° apart).
The "ridges and grooves" model does not describe all the helix-helix packings, as there are examples with unusual inter-axial angles. For instance in the globin fold a pair of helices (B and E) pack such that their ridges cross each other, by means of a notch formed at a pair of glycine residues.
The inter-axial distance between packed helices varies from 6.8-12.0Å, the mean being 9.4 Å; the mean inter-penetration of atoms at the interface is 2.3Å. Therefore it is mainly side chains which make the contacts between the helices.
4.2.7 Other distinctive all-alpha proteins include :
- Delta-Crystallin
- Annexin V
- Glutathione S-transferase
- Calmodulin- and Parvalbumin-like calcium-binding proteins
4.3 All-beta topologies
Protein folds which consist of almost entirely beta sheets exhibit a completely or mostly antiparallel arrangement. Many of these antiparallel domains consist of two sheets packed against each other, with hydrophobic side chains forming the interface. Since the side chains of beta-strands alternatively point to the opposite sides of a sheet, these structures will by trend have alternately hydrophobic and polar residues.
4.3.1 Beta sandwiches and beta barrels
The immunoglobulin fold the strands form two sheets packed against each other, forming a "beta sandwich".
4.3.1.1 Aligned and orthogonal beta sandwiches
In the immunoglobulin and fibronectin type-3 folds, the two sheets are approximately aligned. In fact the mean angle between the 2 sheets is approximately 30° (designated -30° because the uppermost sheet is rotated clockwise with respect to the lower). The two sheets are usually independent in that the linking residues between them are not in beta sheet conformation. The angle between the sheets is determined by their right-handed twist. The observed angle varies between -20° and -50°; this is due to variation in the twist. Also side-chains are not always ideally aligned at the interface.
Orthogonal beta sheet packings consist of beta sheets folded on themselves; the two sheets make an angle of -90°. The strands at one corner or 2 diagonally opposite corners go uninterrupted from one layer to the other. Local coiling at the corner or a beta bulge facilitates the right-angled bend. These bends are right-handed, due to permitted phi and psi angles. The figure below illustrates this model.
Only along one diagonal do the two sheets make contact. Large side-chains in loops usually fill the spaces between the splayed corners. This fold is seen in the Lipocalin family which binds ligands in the sandwich.
Click here to search in the lipocalin family in SWISS-MODEL Repository
4.3.1.2 beta barrels
Some antiparallel beta-sheet domains are better described as beta-barrels rather than beta-sandwiches, for example streptavidin and porin. Note that some structures are intermediate between the extreme barrel and sandwich arrangements.
4.3.2 Up-and-down antiparallel beta sheets
The simplest topology for an antiparallel beta-sheet involves loops connecting adjacent strands.
4.3.2 Up-and-down antiparallel beta sheets
The Greek Key topology is named after a pattern that was common on Greek pottery. It is formed by three consecutive antiparallel beta-strands connected by hairpins followed by a longer connection to the fourth strand, which lies adjacent to the first.
Folds including the Greek key topology have been found to have 5-13 strands. An example is Plastocyanin (3rd-6th has a Greek key topology). Note that Plastocyanin has a mixed sheet- there are two parallel pairs of strands (between 1st and 3rd strand and 2nd and 8th strand).
Gamma-crystallin has two domains each of which is an eight- stranded Beta-barrel-type structure composed of two Greek keys. In fact, the structure is more accurately described as consisting of two Beta-sheets, one consisting of strands 2, 1, 4, 7/ 11, 14, 13, 16 (blue, closer to the viewer in the image) and the other of strands 6 ,5 ,8 , 3/ 10, 9, 12, 15 (yellow) as indicated in the diagram. Sequence similarity has been found between the two Greek key motifs within each domain, and also between the two domains themselves. The latter similarity is higher than the former; this implies that the structure evolved from a single Greek key fold by means of a gene duplication to produce a domain of two Greek keys, followed by a second duplication resulting in two similar domains. This is supported by the fact that in some crystallins each Greek key motif is coded by a different exon, with introns between them.
4.3.2.2 The Jellyroll Topology
Richardson(1981) describes the jellyroll fold as being formed by the addition of an extra "swirl" to a Greek key:
4.3.3 Beta-propellers
A beta-propellor consists typically of four to eight beta-sheet (with typically four strands each). The beta-sheets are arranged in a round shape.
4.3.4 Beta-trefoils
This fold has an approximately 3-fold axis of symmetry.
4.3.5 Beta-Helix
Very unusual fold. The beta-strands wind around the structure describing a helical topology.
4.4 Alpha/beta topologies
The most regular and common domain structures consist of repeating beta-alpha-beta supersecondary units, such that the outer layer of the structure is composed of a helices packing against a central core of parallel beta-sheets. These folds are called alpha/beta , or wound alpha beta.
Many enzymes, including all those involved in glycolysis , are alpha/beta structures. Most alpha/beta proteins are cytosolic.
The beta-alpha-beta is almost always right-handed. In alpha/beta structures, there is a repetition of this arrangement, giving a beta-alpha-beta-alpha .....etc sequence. The beta strands are parallel and hydrogen bonded to each other, while the alpha helices are all parallel to each other, and are antiparallel to the strands. Thus the helices form a layer packing against the sheet.
The beta-alpha-beta-alpha-beta subunit, often present in nucleotide-binding proteins, is named the Rossman Fold, after Michael Rossman (Rao and Rossman,1973).
Richardson (1981) names the alpha/beta structures "parallel alpha/beta domains", to denote the fact that each of the two secondary structures forms a parallel arrangement. Note that there is no obvious reason why one would not expect to find "parallel all alpha" (alpha-alpha-alpha subunit) folds, or "parallel all beta" (beta-beta-beta) folds in equally large numbers, but these do not occur. However, the marked tendency for helices to pack aligned with sheets has been explained by the "complementary twist" model (Chothia et al. , 1977). The right-handed twist of beta-sheets and the right-handed twist of the row of every 4th residue of the helices (the "i+4" ridges"- see section 4.2.4 on helix-helix packing) mean that the two have complementary surfaces when aligned. This model is supported by the observation that approximately 90% of the helix residues which interface with a sheet are indeed a multiple of 4 residues apart. Helices packing side by side on a sheet would have helices rotated with respect to each other, due to the sheet twist; the observed interhelical angle is in agreement with this model in 80% of cases. In the other cases the helices are splayed from the sheet, with only one end in contact.
4.4.1 alpha/beta horseshoe
The structure of the remarkable placental ribonuclease inhibitor takes the concept of the repeating alpha/beta unit to extremes. It is a cytosolic protein that binds extremely strongly to any ribonuclease that may leak into the cytosol. This structure here is a dimer of the human ribonuclease inhibitor in complex with ribonuclease (the very top and bottom chain).
One would expect that this fold would form a complete barrel. This is however not the case. The strands are only very slightly slanted, being nearly parallel to the central `axis'. Click here to see the structure interactively.
4.4.2 TIM barrels (alpha/beta barrels)
TIM barrels are named after triosephosphate isomerase for which the fold was first observed. The TIM barrel is an ubiquitous fold that is comprised of 8 alpha helices and 8 beta strand that alternate along the amino acid chain. Consider a sequence of eight beta-alpha motifs. The beta sheets form a parallel beta-barrel, while the alpha helices are outside of the barrel. The first and last strand form hydrogen bonds and close the barrel.
In a structure which is open rather than closed like the barrel, helices would be situated on only one side of the beta sheet if the sheet direction did not reverse. Therefore open alpha/beta structures must be doubly wound to cover both sides of the sheet.
4.4.3 Alpha+Beta Topologies
This are folds which include significant alpha and beta secondary structural elements, but for which those elements are `mixed', in the sense that they do NOT exhibit the wound alpha-beta topology. This class of folds is therefore referred to as a+ b Some better known examples are:
- Bacterial and mammalian pancreatic ribonucleases
- Lysozome
- Ubiquitin
- Histidine-Carrier protein
- Cysteine proteases such as papain and actinidin
- Zinc Metallo-proteases
- Sh1 domains
- Protein G (prokaryotic Ig-binding) in blue
- Carbonic anhydrases
- Thymidylate synthase
4.5 Small disulphide-rich folds
A few examples of the main families of small disulphide-rich domains of known structure. The members of these families contain a large number of disulphide bonds which stabilise the fold.
- Serine proteinase inhibitor
- Sea anemone toxin (NMR structure)
- EGF-like domain
- Complement C-module domain
- Wheat Plant Toxin; Naja (Cobra) neurotoxin; green Mamba anticholinesterase
- Kringle domain
4.6 Structure Classification Schemes
The previous chapters gave a broad overview of protein structures. There are two notable endeavors to classify all proteins. SCOP and CATH. Intuitively one might ask the question whether there is a limited amount of principal folds existing. Interestingly no new folds were identified after 2008; respectively 2012, depending on the algorithm used.
4.6.1 SCOP: Structural Classification of Proteins
Introduction:
Nearly all proteins have structural similarities with other proteins and, in some of these cases, share a common evolutionary origin. A knowledge of these relationships is crucial to our understanding of the evolution of proteins and of development.
The scop database aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in Protein Data Bank (PDB). It is available as a set of tightly linked hypertext documents which make the large database comprehensible and accessible. In addition, the hypertext pages offer a panoply of representations of proteins, including links to PDB entries, sequences, references, images and interactive display systems. The data can be directly accessed on the SCOP webpage.
Structural annotation in SCOP is done both manually and automatically.
Classification:
Proteins are classified to reflect both structural and evolutionary relatedness. Many levels exist in the hierarchy, but the principal levels are family, superfamily and fold, described below. The exact position of boundaries between these levels are to some degree subjective. The evolutionary classification is generally conservative: where any doubt about relatedness exists, new divisions at the family and superfamily levels were made. Thus, some researchers may prefer to focus on the higher levels of the classification tree, where proteins with structural similarity are clustered.
The different major levels in the hierarchy are (from top to bottom):
- Superfamilies: Bridging together protein families with common functional and structural features inferring probable common ancestors
- Family: Proteins with related sequence but typically with distinct function.
- Proteins: Sequences of essentially with essentially the same function (Different species, different isoforms)
- Classes: Folds with similar structure
- Folds: Similar structural elements
Proteins are defined as having a common fold if they have same major secondary structures in same arrangement and with the same topological connections. Different proteins with the same fold often have peripheral elements of secondary structure and turn regions that differ in size and conformation. In some cases, these differing peripheral regions may comprise half the structure. Proteins placed together in the same fold category may not have a common evolutionary origin: the structural similarities could arise just from the physics and chemistry of proteins favoring certain packing arrangements and chain topologies.
Andreeva A,Howorth D,Chandonia JM,Brenner SE,Hubbard TJP, Chothia C and Murzin AG (2007) Data growth and its impact on the SCOP database: new developments Nucleic Acids Research, 2008, Vol. 36
4.6.2 CATH: Classification of protein structures
Introduction:
CATH is a hierarchical classification of protein domain structures, which clusters proteins at four major levels, class(C), architecture(A), topology(T) and homologous superfamily (H). Annotation of domains is both manual and automatic
- Class (similar to class from SCOP): Is defined by the secondary structure content (All alpha, all beta, alpha/beta etc.).
- Architecture: Clustering of structurally similar arrangement of secondary elements, independent of their connectivity however
- Topology or fold family: Structural grouping depending on both overall 3D shape and connectivity
- Homologous superfamilies: Grouping of protein domains with (predicted to have) a common ancestor.
Sillitoe I, Lewis, TE, Cuff AL, Das S, Ashford P, Dawson NL, Furnham N, Laskowski RA, Lee D, Lees J, Lehtinen S, Studer R, Thornton JM, Orengo CA. CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res. 2015 Jan doi: 10.1093/nar/gku947