12 February 2007
Biol 6312
Motifs in Protein Structure
Protein Domains: Small proteins usually consist of a single domain of 100-200 amino acids. A domain is a compact, often self-folding polypeptide chain.
Half of all domains are between 50 and 150 amino acids. The largest known domain is 907 amino acids.
Large proteins are not generally made of larger domains, but rather from several normal sized domains. The largest number of domains known to be in a single protein is 13. Each domain might have a unique function, but some domains join other domains to make functional units. In some proteins the polypeptide chain wanders from one domain back to another. The lac repressor has domains for DNA binding and for oligomerization. (Fig. 1-42) (Jmol)
In a 2-domain protein sometimes one domain is from the N-terminal part of the chain and the other domain is from the C-terminal part. In other proteins, the chain starts in one domain, then goes to make a second domain, and finally returns to the first domain. (Fig. 1-42) (Jmol)
Domains always have a hydrophobic core. This tends to put a lower limit on the size of a domain. It must be large enough to have an interior. This provides stability via the hydrophobic effect.
Some proteins have two or more domains of almost identical structure. This is likely to be the result of duplication of a progenitor gene, followed by gene fusion. If this has occurred, one might be able to detect sequence similarity, if the duplication did not occur too long ago. Once duplicated, the 2 genes will evolve somewhat independently, apart from possible structural and functional constraints. Example is thioesterase from E. coli, which has 2 nearly identical domains, (Fig. 1-43). A related protein, thioester dehydrase, composed of 2 identical subunits, is also shown.
The eye lens protein gamma-crystallin provides an example of a protein with a gene duplication within a single domain, in addition to a duplicated domain. (Fig. 1-44) (Jmol)
The bovine gamma-B-crystallin sequence is 2BB2 Notice that starting at residue 8 the sequence is FEQENFQG skip over to the right and find YEQANCKG
This is the weak sequence similarity within the first domain.
Each domain has a protein fold. How many different protein folds are there?
Perhaps just more than 1000.
The protein data bank has over 40,000 strutures, mostly of proteins. Many of these are of a single protein reported many times, e.g. mutants and with different bound ligands. In addition, many different proteins have similar or virtually identical folds. So the total number of folds is likely to be very much smaller than the total number of proteins. Of course, a protein with many domains, may also have multiple folds.
Even though two proteins have similar folds, it does not follow that they must have similar amino acid sequences, or carry out similar functions. See tryptophan synthase and galactonate dehydratase. (Fig. 1-45) Each has 2 domains, one of which is an alpha/beta barrel. These enzymes appear to be unrelated in terms of amino acid sequences or function.
Many proteins in signal transduction pathways or cell-cycle control are composed of multiple domains, which are often built up in a modular way. (Fig. 1-46) Modular means that each can carry out its function, typically binding, or an enzyme reaction, as a single unit. In practice, several binding partners might act cooperatively.
Examples of two proteins with the same fold, but different functions: aldose reductase and phosphotiesterase both have the alpha/beta barrel fold. (Fig. 1-47) (Jmol).
What about enzymes that carry out similar functions. Do they have the same folds? Not necessarily. Aspartate amino transferase and D-amino acid aminotransferase do not, for example. (Fig. 1-48) (Jmol)
Multi-domain proteins are related to evolution. Fusion of genes that each code for one domain can lead to new proteins. It is simple if the genes are fused end to end. But if one gene is inserted into another, then the details are important. That is why one often sees domains that are connected at the loops of the domains. That can allow both domains to fold properly. An insertion into an alpha-helix, for example, or into any interior part of the protein would likely be disastrous.
Protein Fold Databases: There ae two well known databases of proteins that attempt to classify all domains according to a hierarchy of structures. Take a look at the links to see how they are organized.
SCOP (Structural Classification of Proteins) starts out with 11 categories, but the first 4 are the most important: all alpha, all beta, alpha/beta (alternating), and alpha + beta (distinct)
CATH starts out with 4 categories (Class): mainly alpha, mainly beta, mixed alpha and beta, and irregular. From that classification there is a hierarchy of 3 more levels: Architecture (which considers the orientation of secondary structures independent of connectivity), Topology (which considers topological connections), Homologous superfamily (See CATH)
Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG.
SCOP database in 2004: refinements integrate structure and sequence family data.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D226-9.
Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F, Nambudiry R, Reid A, Sillitoe I, Yeats C, Thornton JM, Orengo CA.
The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution.
Nucleic Acids Res. 2007 Jan;35(Database issue):D291-7. Epub 2006 Nov 29.
Orengo CA, Thornton JM.
Protein families and their evolution-a structural perspective.
Annu Rev Biochem. 2005;74:867-900. Review.
In order to recognize and classify domains according to their polypeptide folds, it is helpful to see a level of organization that falls between secondary and tertiary structure. This is called supersecondary structure, and the elements are refered to as structural motifs. Structural motifs, such as beta-hairpins, need not be associated with any particular function. Some are, such as helix-turn-helix, which can be a DNA-binding element. There are also sequence motifs, which are often associated with some biochemical function, and are recognized by conseved amino acids. For example, zinc fingers can often be recognized by the sequence:
CXX(XX)CXXXXXXXXXXXXHXXXH
due to the geometry of using Cys and His as ligands for the Zn (Fig. 1-49).
This sequence motif is relatively easy to identify because it is contiguous.
A binding site for a substrate is likely to consist of discontiguous sequence elements, such as the catalytic site of chymotrypsin which includes: His 57, Asp 102, and Ser 195.
A different enzyme that carries out a similar reaction, subtilisin, has the same 3 residues, known as the catalytic triad, but they appear in a different order: Asp 32, His 64, and Ser 221. This indicates that the genes are not related. Or rather, it provides no evidence that the genes are related. If they are, some shuffling must have occurred. But since the secondary structure of the proteins are completely different, it seems very unlikely that the genes are related.
This is known as a case of convergent evolution: 2 unrelated genes have eventually placed the same 3 residues into an almost identical catalytic site. (Fig. 1-52) (Jmol)
Structural motifs include:
1. helix-loop-helix (DNA binding, Ca2+ binding)
These are 2 alpha-helices separated by a turning region. Jmol
2. beta hairpin
These are 2 anti-parallel beta-strands separated by a short turn. Jmol3. beta-alpha-beta
This is an alternating patterrn of beta-strands and alpha-helices4. Greek key beta
This is 4-stranded beta-sheet, of anti-parallel strands in the order of 4-1-2-3
Classification of Folds according to elements of secondary structure:
All alpha, all beta, alpha/beta alternating, alpha + beta, and irregular or cross-linked domains.
All Alpha-Helix Proteins
All alpha-helical proteins can be built up from alpha-alpha elements. The most common folds include the following 2 groups:
1. 4-helix bundle
2. Globin fold
The 4-helix bundle helices are typically 15-24 aa in length. (longer than an average helix)
![]() |
The 4 helices are often antiparallel with their nearest neighbors. For example, 1 and 3 go up and 2 and 4 go down. This permits direct connectivity(myohemerythrin), but some proteins will have cross-over connections (human growth factor). Some proteins use multiple polypeptides (Rop and lac repressor) to form a 4-helix bundle. |
Many unrelated proteins have this fold. There are many variations on it, e.g. crossovers connections, and extra helices.
Proteins with many different aa sequences, and many different functions.
Active sites are often found between the helices. Binding sites for larger molecules might be found on the surface.
Surface helices, such as many in 4-helix bundles, are amphipathic. The outer surfaces, facing the solvent water, are polar, while the inner surfaces, bearing hydrophobic residues, are nonpolar (e.g., Ala, Phe, Leu, Val, Ile).
Rop is a dimer of identical subunits of 63 aa that forms a 4-helix bundle. It binds RNA and is involved in the control of copy number of certain palsmids, including pBR322. (Jmol)
Other examples:
Globin fold:
This is a much more complex topology, usually involving about 8 helices, with a variety of crossing angles. There are 2 main groups:
a) oxygen carriers: myoglobin (Jmol), hemoglobin, neuroglobin, and cytoglobin (Jmol)
b) phycocyanins: light gathering proteins (Jmol)
The 8 helices are labeled A-H and in myoglobin, range from 7 to 28 residues. Generally the helices interact with non-neighbors, except for G-H which form an anti-parallel helical hairpin.
The alpha-helix was proposed by Linus Pauling in 1950, and rules for helix-helix interactions were developed by Francis Crick in 1955. Later, this was taken up by Chothia et al in 1981.
Chothia C, Levitt M, Richardson D.
Helix to helix packing in proteins.
J Mol Biol. 1981 Jan 5;145(1):215-50For 2 helices to interact closely, or to pack well against each other, the "knobs" of one helix must fit into the "holes" of the other helix. This will tend to work best at particular crossing angles of the helices: 20˚ and 50˚. This is also refered to as "ridges into grooves"
The evolutionary relationship of globins and phycocyanins is very distant, as analyzed by Lesk et al. 1980 and 1990. The sequence identity is about 15% , which is considered too low to be convincing, by itself, to establish a relationship (the twilight zone). Also, even key residues, such as those that bind the heme are not conserved. Residues that mediate helix-helix contact are not conserved. What is conserved, that allows the same fold? Not even size, but rather hydrophobic character, which includes compensation: e.g. Ala-Leu pairs can be switched to Ile-Ala pairs. This will cause changes in helix-helix orientations.
Evolutionary relationship between the 2 types of globins:
Pastore A, Lesk AM
Comparison of the structures of globins and phycocyanins: evidence for evolutionary relationship.
Proteins 1990;8(2):133-55
available in Room 242
Coiled-Coils
Alpha-helices have the ability to form long bundles of 2 helices that wrap around each other. These are called coiled-coils. They were first identified in the fibrous proteins called keratins, but were later "re-discovered" in transcription factors in the 1980's.
alpha-helix is a repeat of 3.6 residues per turn, and is right-handed.
A coiled-coil of 2 helices is usually left-handed, making the repeating surface 3.5 residues per turn. This allows a sequence repeat of exactly 7 residues (2 turns). Residues numbered 1 and 4 in the heptad (7) repeat are found at the interface of the helices.
By convention position 1 is a, position 2 is b, 3-c, 4-d, 5-e, 6-f, 7-g (Interface a, d)
Position a is most commonly : Leu, Ile, Ala, and Val
Position d is most commonly: Leu, Ala, (Ile, Val)
Other residues commonly included charged ones.
from Wiki
If the coiled-coil is right-handed then the repeat is 3.67, resulting in a helix repest of 11 residues (3 turns).
When the heptad repeat was "re-discovered" it was called the Leucine zipper. This is not quite right, since the interacting Leu residues interact between chains in a side by side manner, rather than above and below one another as the zipper metaphor implies.
Both parallel and anti-parallel varieties are found, and they appear quite similar. One difference is that in the
parallel form: residues a-a and d-d interact
anti-parallel: residues a-d and d-a interact.
Charged residues outside the interface can form ion pairs between chains. They can influence the stability of the coiled-coil, and their positions can influence whether parallel or anti-parallel coiled-coils form. In studies of model polypeptides, depending upon the residues at the a and d positions (e.g. more Val and Ile), trimers and tetramers can also form, as is sometimes seen in nature.
Crystal structure of a trimeric coiled-coil:
Nautiyal S, Alber T
Crystal structure of a designed, thermostable, heterotrimeric coiled coil.
Protein Sci 1999 Jan;8(1):84-90
Tropomyosin (Fig. 1-67)
Hetero-trimeric Coiled-coil (Jmol)
Reviews:
Lupas A
Coiled coils: new structures and new functions.
Trends Biochem Sci 1996 Oct;21(10):375-82
Lupas A
Predicting coiled-coil regions in proteins.
Curr Opin Struct Biol 1997 Jun;7(3):388-93
Websites for prediction of coiled coils: at ch.embnet.org
"Coils" Try this sequence(AMEAKRKAEEHISSSHGDVDYAQASAELAKAIAQLRVIELTKKAM)
It returns this result:

by Lupas "Coiled Coil" at NPSA
Prediction of Coiled-Coil Partners
John Walshaw, Derek N. Woolfson
SOCKET: A Program for Identifying and Analysing Coiled-coil Motifs Within Protein Structures ,Journal of Molecular Biology, Vol. 307, No. 5, April 2001, 1427-1450
Website for Sockets
Comments/questions: Email me
Copyright 2007, Steven B. Vik, Southern Methodist University