4 Tertiary Protein Structure and Folds

4.1 Introduction

Chapters 1 and 2 introduced alpha-helices and beta-sheets (Secondary Structure), and some common "motifs" composed of 2 or 3 of these elements (Super-secondary Structure). Tertiary structure describes the folding of the polypeptide chain to assemble the different secondary structure elements in a particular arrangement. As helices and sheets are units of secondary structure, so the domain is the unit of tertiary structure. In multi-domain proteins, tertiary structure includes the arrangement of domains relative to each other as well as that of the chain within each domain.

There is a blurred distinction between "super-secondary structure" and "tertiary structure". The introduction of the term "super-secondary structure" was necessary when it became clear that certain arrangements of two or three secondary structures are present in many different protein structures, even with completely different sequences.

Note that some proteins do not consist of an assembly of these super-secondary motifs. For example, proteins of the globin family consist of eight alpha-helices in contact, but the helices do not pack against other helices which are adjacent in the sequence, with the exception of the final two, which form an antiparallel helix-turn-helix motif.

Although the term "motif" is often used to describe super-secondary structures (e.g. Branden and Tooze, 1991), it may also be used to describe a consensus sequence of amino acids identified in a number of different proteins, rather than a repeated 3D conformation. Such a consensus in primary structure generally implies a similarity in tertiary structure. But bear in mind that there are very many protein sequences of which the 3D structures are not known for certain, so that the term "motif" strictly applies to primary rather than super-secondary or tertiary structure in these cases.

Below is a non exhaustive list of topologies.

4.2 All-alpha topologies

4.2.1 The lone helix

There are a number of examples of small proteins (or peptides) which consist of little more than a single helix. A striking example is alamethicin (view on PDB), a transmembrane voltage gated ion channel, acting as a peptide antibiotic.

4.2.2 The helix-turn-helix motif

The simplest packing arrangement of a domain of two helices is for them to lie antiparallel, connected by a short loop. This constitutes the structure of the small (63 residue) RNA-binding protein Rom , which is found in certain plasmids (small circular molecules of double-stranded DNA occurring in bacteria and yeast) and involved in their replication. There is a slight twist in the arrangement as shown.

Cartoon representation of the biological assembly of ColE1 Rom Protein — **Cartoon representation of the ColE1 Rom Protein**. The quaternary structure (see Chapter 5) is homo-dimeric. Each chain (green and yellow) consist of a single helix-turn-helix motif. (View on PDB)

4.2.3 The four-helix bundle

The four-helix bundle is found in a number of different proteins. In many cases the helices are part of a single polypeptide chain, connected to each other by three loops. However, the Rom molecule is in fact a dimer of two of the two-helix units shown above.

In four-helix-bundle proteins the interfaces between the helices consist mostly of hydrophobic residues while polar side chains on the exposed surfaces interact with the aqueous environment, as indicated below:

Four helix bundle from ColE1 Rom Protein. Coloured by hydrophobicity — **ColE1 Rom protein coloured by hydrophobicity** (red to hydrophilic blue). From Left to right: Side View in cartoon stile, view from the top in cartoon stile, view from top showing all atoms and an abstract view into the helices.

The central helices of the photosynthetic reaction centre in fact are arranged similar to the four-helix bundle.

Other examples exhibit a much more open packing arrangement, as in the steroid-binding proteins uteroglobin, and Clara cell 17kDa protein.

4.2.3.1 Myohemerythrin

The four helices may be arranged in a simple up-and-down topology, as indicated. A good example is myohemerythrin.

Schematic and structural view of myohemerythrin — Left: Schematic view of the secondary structure arrangement. Right: Cartoon view of myohemerythrin

4.2.3.2 Ferritin

A more complex arrangement, such as ferritin is possible:

Schematic and structural view of ferritin — Left: Schematic view of the secondary structure arrangement. Note the rearrangement of the helices. Right: Cartoon view of ferritin.

4.2.3.3 Cytokines

A number of cytokines consist of four alpha-helices such as Interleukin-2 and human Growth Hormone.

artoon view of Interleukin-2 (PDB id 4zf7) and human growth factor (PDB id 1huw) — Cartoon view of Interleukin-2 (left, PDB id 4ZF7) and human growth factor (right, PDB id 1HUW)

4.2.4 alpha domains which bind DNA

Transcription factors are proteins which bind to control regions of DNA. The RNA-binding two-helix protein Rom has already been mentioned. A three-helix bundle forms the basis of a DNA-binding domain which occurs in a number of proteins- for example homeodomain proteins. Examine the crystal structure of engrailed homeodomain binding to DNA.

Two Drosophila engrailed homeodomains bound to DNA — **Two engrailed homeodomains from D. Melanogaster bound to DNA**.

4.2.5 Globins

The globin fold usually consists of eight alpha-helices. The two helices at the end of the chain are antiparallel, forming alpha helix-turn-helix motif, but the remainder of the fold does not include any characterised super-secondary structures. These helices pack against each other with larger angles, around 50°, between them than what occurs between antiparallel helices (approximately 20°). See the section below on helix-helix packing. Jane Richardson (1981) describes the globin fold as a "Greek key helix bundle", due to the topological similarity with the Greek key arrangement of antiparallel beta-sheets (see section 4.3 on all-beta topologies).

4.2.6 Helix-helix packing

When alpha-helices pack against each other, the side-chains in their interface are buried. The two interface areas should have complementary surfaces. The surface of an alpha-helix can be thought of as consisting of grooves and ridges, like a screw thread: for instance, the side chains of every 4th residue form a ridge (because there are 3.6 residues per turn). The direction of this ridge is 26° from the direction of the helix axis. Therefore if 2 helices pack such that such a ridge from each fits into the other's groove, the expected angle between the two is 52°. In fact, in the distribution of this angle between packed alpha-helices, there is a sharp peak at 50°. Besides the type of ridge described, ridges can be formed by other stacking patterns of residues, such as every 3rd residue, or indeed every residue. Which ridges are used for packing depends on the size and conformations of the side chains at these relative positions. The "i+4" ridge is believed to be the most common because residues at every 4th position have side-chains which are more closely aligned than in "i+3" or "i+1" ridges as indicated below.

i+4 and i+3 stacking example — **The two different stacking modes i+4 and i+3**. The different colours show which amino acids would interact with the other helix.

Two other types of packing do occur, however: between an "i+4" ridge and an "i+3" ridge (there is an angle of 23° between the 2 helix axes) and between an "i+4" and an "i+1" ridge (the helices are 105° apart).

The "ridges and grooves" model does not describe all the helix-helix packings, as there are examples with unusual inter-axial angles. For instance in the globin fold a pair of helices (B and E) pack such that their ridges cross each other, by means of a notch formed at a pair of glycine residues.

Notch helix-helix packing in globin fold. Protein view and detailed view of the Notch — Left: **Cartoon view of myoglobin**. Helix B and E are highlighted. Middle: View into the Notch (residues 20 to 30 and 60 to 70) showing only the backbone. Glycine 25 and 65 are highlighted. Right: The same as middle, but showing all heavy atoms.

The inter-axial distance between packed helices varies from 6.8-12.0Å, the mean being 9.4 Å; the mean inter-penetration of atoms at the interface is 2.3Å. Therefore it is mainly side chains which make the contacts between the helices.

4.2.7 Other distinctive all-alpha proteins include :

Delta-Crystallin
Annexin V
Glutathione S-transferase
Calmodulin- and Parvalbumin-like calcium-binding proteins

4.3 All-beta topologies

Protein folds which consist of almost entirely beta sheets exhibit a completely or mostly antiparallel arrangement. Many of these antiparallel domains consist of two sheets packed against each other, with hydrophobic side chains forming the interface. Since the side chains of beta-strands alternatively point to the opposite sides of a sheet, these structures will by trend have alternately hydrophobic and polar residues.

Antibody light chain coloured according to its hydrophobicity — **Cartoon view of an antibody light chain**. Red indicates hydrophobic residues, blue hydrophilic ones. Note the alternating pattern in the beta-sheets

4.3.1 Beta sandwiches and beta barrels

The immunoglobulin fold the strands form two sheets packed against each other, forming a "beta sandwich".

4.3.1.1 Aligned and orthogonal beta sandwiches

In the immunoglobulin and fibronectin type-3 folds, the two sheets are approximately aligned. In fact the mean angle between the 2 sheets is approximately 30° (designated -30° because the uppermost sheet is rotated clockwise with respect to the lower). The two sheets are usually independent in that the linking residues between them are not in beta sheet conformation. The angle between the sheets is determined by their right-handed twist. The observed angle varies between -20° and -50°; this is due to variation in the twist. Also side-chains are not always ideally aligned at the interface.

Orthogonal beta sheet packings consist of beta sheets folded on themselves; the two sheets make an angle of -90°. The strands at one corner or 2 diagonally opposite corners go uninterrupted from one layer to the other. Local coiling at the corner or a beta bulge facilitates the right-angled bend. These bends are right-handed, due to permitted phi and psi angles. The figure below illustrates this model.

Schematic view of beta-sandwich, adapted from Chothia (1984) — Left: Schematic representation of the beta-sandwich, adapted from Chothia (1984). Right: Example of a Beta-sandwich.

Cartoon view of beta-sandwich — Left: Schematic representation of the beta-sandwich, adapted from Chothia (1984). Right: Example of a Beta-sandwich.

Only along one diagonal do the two sheets make contact. Large side-chains in loops usually fill the spaces between the splayed corners. This fold is seen in the Lipocalin family which binds ligands in the sandwich.

Click here to search in the lipocalin family in SWISS-MODEL Repository

4.3.1.2 beta barrels

Some antiparallel beta-sheet domains are better described as beta-barrels rather than beta-sandwiches, for example streptavidin and porin. Note that some structures are intermediate between the extreme barrel and sandwich arrangements.

4.3.2 Up-and-down antiparallel beta sheets

The simplest topology for an antiparallel beta-sheet involves loops connecting adjacent strands.

Structure of streptavidin and arrangement of beta sheets — Left: Schematic arrangement of beta sheets of the beta-barrel forming protein streptavidin. Right: Cartoon view of streptavidin. Rainbow colouring from N-terminal end in blue to C-terminal in red.

4.3.2 Up-and-down antiparallel beta sheets

The Greek Key topology is named after a pattern that was common on Greek pottery. It is formed by three consecutive antiparallel beta-strands connected by hairpins followed by a longer connection to the fourth strand, which lies adjacent to the first.

Schematic view of the Greek key topology. The strands are Rainbow coloured from blue to red.

Folds including the Greek key topology have been found to have 5-13 strands. An example is Plastocyanin (3rd-6th has a Greek key topology). Note that Plastocyanin has a mixed sheet- there are two parallel pairs of strands (between 1st and 3rd strand and 2nd and 8th strand).

Example of a protein with a Greek key. Left schematic view of the protein
topology. Right: Structure of plastocyanin — Plastocyanin as an example of a protein with a Greek key. Left schematic view of the protein topology. Right: Structure of plastocyanin

Gamma-crystallin has two domains each of which is an eight- stranded Beta-barrel-type structure composed of two Greek keys. In fact, the structure is more accurately described as consisting of two Beta-sheets, one consisting of strands 2, 1, 4, 7/ 11, 14, 13, 16 (blue, closer to the viewer in the image) and the other of strands 6 ,5 ,8 , 3/ 10, 9, 12, 15 (yellow) as indicated in the diagram. Sequence similarity has been found between the two Greek key motifs within each domain, and also between the two domains themselves. The latter similarity is higher than the former; this implies that the structure evolved from a single Greek key fold by means of a gene duplication to produce a domain of two Greek keys, followed by a second duplication resulting in two similar domains. This is supported by the fact that in some crystallins each Greek key motif is coded by a different exon, with introns between them.

Cartoon view and schematic view of secondary structure arrangement in Gamma-crystallin — Top: Cartoon view of Gamma-crystallin. The Beta strands are labeled. Bottom schematic view of secondary structure. Blue indicates the beta-sheets closer to the viewer, while yellow represents the sheets further away. Note that sheets 3, 7, 11 and 15 were not detected as beta-sheets in this image representation.

4.3.2.2 The Jellyroll Topology

Richardson(1981) describes the jellyroll fold as being formed by the addition of an extra "swirl" to a Greek key:

Coat protein of satellite tobacco necrosis virus and its Jellyroll topology — Right: Cartoon view of the coat protein of satellite tobacco necrosis virus (onto both beta-sheets). Left: Schematic view of the secondary structure arrangement.

4.3.3 Beta-propellers

A beta-propellor consists typically of four to eight beta-sheet (with typically four strands each). The beta-sheets are arranged in a round shape.

4.3.4 Beta-trefoils

This fold has an approximately 3-fold axis of symmetry.

4.3.5 Beta-Helix

Very unusual fold. The beta-strands wind around the structure describing a helical topology.

4.4 Alpha/beta topologies

The most regular and common domain structures consist of repeating beta-alpha-beta supersecondary units, such that the outer layer of the structure is composed of a helices packing against a central core of parallel beta-sheets. These folds are called alpha/beta , or wound alpha beta.

Many enzymes, including all those involved in glycolysis , are alpha/beta structures. Most alpha/beta proteins are cytosolic.

The beta-alpha-beta is almost always right-handed. In alpha/beta structures, there is a repetition of this arrangement, giving a beta-alpha-beta-alpha .....etc sequence. The beta strands are parallel and hydrogen bonded to each other, while the alpha helices are all parallel to each other, and are antiparallel to the strands. Thus the helices form a layer packing against the sheet.

The beta-alpha-beta-alpha-beta subunit, often present in nucleotide-binding proteins, is named the Rossman Fold, after Michael Rossman (Rao and Rossman,1973).

Right handed beta-alpha-beta fold and Rossman fold — Comparison between the Right handed beta-alpha-beta fold (left) and Rossman fold (right). Beta-strands are green, alpha-helices purple.

Richardson (1981) names the alpha/beta structures "parallel alpha/beta domains", to denote the fact that each of the two secondary structures forms a parallel arrangement. Note that there is no obvious reason why one would not expect to find "parallel all alpha" (alpha-alpha-alpha subunit) folds, or "parallel all beta" (beta-beta-beta) folds in equally large numbers, but these do not occur. However, the marked tendency for helices to pack aligned with sheets has been explained by the "complementary twist" model (Chothia et al. , 1977). The right-handed twist of beta-sheets and the right-handed twist of the row of every 4th residue of the helices (the "i+4" ridges"- see section 4.2.4 on helix-helix packing) mean that the two have complementary surfaces when aligned. This model is supported by the observation that approximately 90% of the helix residues which interface with a sheet are indeed a multiple of 4 residues apart. Helices packing side by side on a sheet would have helices rotated with respect to each other, due to the sheet twist; the observed interhelical angle is in agreement with this model in 80% of cases. In the other cases the helices are splayed from the sheet, with only one end in contact.

4.4.1 alpha/beta horseshoe

The structure of the remarkable placental ribonuclease inhibitor takes the concept of the repeating alpha/beta unit to extremes. It is a cytosolic protein that binds extremely strongly to any ribonuclease that may leak into the cytosol. This structure here is a dimer of the human ribonuclease inhibitor in complex with ribonuclease (the very top and bottom chain).

One would expect that this fold would form a complete barrel. This is however not the case. The strands are only very slightly slanted, being nearly parallel to the central `axis'. Click here to see the structure interactively.

Cartoon view of placental ribonuclease inhibitor dimer in complex with ribonuclease (the very top and bottom chain)

4.4.2 TIM barrels (alpha/beta barrels)

TIM barrels are named after triosephosphate isomerase for which the fold was first observed. The TIM barrel is an ubiquitous fold that is comprised of 8 alpha helices and 8 beta strand that alternate along the amino acid chain. Consider a sequence of eight beta-alpha motifs. The beta sheets form a parallel beta-barrel, while the alpha helices are outside of the barrel. The first and last strand form hydrogen bonds and close the barrel.

In a structure which is open rather than closed like the barrel, helices would be situated on only one side of the beta sheet if the sheet direction did not reverse. Therefore open alpha/beta structures must be doubly wound to cover both sides of the sheet.

Cartoon view of a TIM barrel forming protein — Cartoon view of a TIM barrel. PDB id: 8TIM

4.4.3 Alpha+Beta Topologies

This are folds which include significant alpha and beta secondary structural elements, but for which those elements are `mixed', in the sense that they do NOT exhibit the wound alpha-beta topology. This class of folds is therefore referred to as a+ b Some better known examples are:

Bacterial and mammalian pancreatic ribonucleases
Lysozome
Ubiquitin
Histidine-Carrier protein
Cysteine proteases such as papain and actinidin
Zinc Metallo-proteases
Sh1 domains
Protein G (prokaryotic Ig-binding) in blue
Carbonic anhydrases
Thymidylate synthase

Cartoon view of Tyrosine-protein kinase ZAP-70 — Tyrosine-protein kinase ZAP-70 which contains two SH2 domains (from |-> to -|) PDB id: 1Z7X

4.5 Small disulphide-rich folds

A few examples of the main families of small disulphide-rich domains of known structure. The members of these families contain a large number of disulphide bonds which stabilise the fold.

Serine proteinase inhibitor
Sea anemone toxin (NMR structure)
EGF-like domain
Complement C-module domain
Wheat Plant Toxin; Naja (Cobra) neurotoxin; green Mamba anticholinesterase
Kringle domain

4.6 Structure Classification Schemes

The previous chapters gave a broad overview of protein structures. There are two notable endeavors to classify all proteins. SCOP and CATH. Intuitively one might ask the question whether there is a limited amount of principal folds existing. Interestingly no new folds were identified after 2008; respectively 2012, depending on the algorithm used.

4.6.1 SCOP: Structural Classification of Proteins

Introduction:

Nearly all proteins have structural similarities with other proteins and, in some of these cases, share a common evolutionary origin. A knowledge of these relationships is crucial to our understanding of the evolution of proteins and of development.

The scop database aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in Protein Data Bank (PDB). It is available as a set of tightly linked hypertext documents which make the large database comprehensible and accessible. In addition, the hypertext pages offer a panoply of representations of proteins, including links to PDB entries, sequences, references, images and interactive display systems. The data can be directly accessed on the SCOP webpage.

Structural annotation in SCOP is done both manually and automatically.

Classification:

Proteins are classified to reflect both structural and evolutionary relatedness. Many levels exist in the hierarchy, but the principal levels are family, superfamily and fold, described below. The exact position of boundaries between these levels are to some degree subjective. The evolutionary classification is generally conservative: where any doubt about relatedness exists, new divisions at the family and superfamily levels were made. Thus, some researchers may prefer to focus on the higher levels of the classification tree, where proteins with structural similarity are clustered.

The different major levels in the hierarchy are (from top to bottom):

Superfamilies: Bridging together protein families with common functional and structural features inferring probable common ancestors
Family: Proteins with related sequence but typically with distinct function.
Proteins: Sequences of essentially with essentially the same function (Different species, different isoforms)
Classes: Folds with similar structure
Folds: Similar structural elements

Proteins are defined as having a common fold if they have same major secondary structures in same arrangement and with the same topological connections. Different proteins with the same fold often have peripheral elements of secondary structure and turn regions that differ in size and conformation. In some cases, these differing peripheral regions may comprise half the structure. Proteins placed together in the same fold category may not have a common evolutionary origin: the structural similarities could arise just from the physics and chemistry of proteins favoring certain packing arrangements and chain topologies.

Andreeva A,Howorth D,Chandonia JM,Brenner SE,Hubbard TJP, Chothia C and Murzin AG (2007) Data growth and its impact on the SCOP database: new developments Nucleic Acids Research, 2008, Vol. 36

4.6.2 CATH: Classification of protein structures

Introduction:

CATH is a hierarchical classification of protein domain structures, which clusters proteins at four major levels, class(C), architecture(A), topology(T) and homologous superfamily (H). Annotation of domains is both manual and automatic

Class (similar to class from SCOP): Is defined by the secondary structure content (All alpha, all beta, alpha/beta etc.).
Architecture: Clustering of structurally similar arrangement of secondary elements, independent of their connectivity however
Topology or fold family: Structural grouping depending on both overall 3D shape and connectivity
Homologous superfamilies: Grouping of protein domains with (predicted to have) a common ancestor.

Sillitoe I, Lewis, TE, Cuff AL, Das S, Ashford P, Dawson NL, Furnham N, Laskowski RA, Lee D, Lees J, Lehtinen S, Studer R, Thornton JM, Orengo CA. CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res. 2015 Jan doi: 10.1093/nar/gku947