14 February 2005

Biol 6312

Motifs in Protein Structure

Protein Domains: Small proteins usually consist of a single domain of 100-200 amino acids. A domain is a compact, often self-folding polypeptide chain.

Half of all domains are between 50 and 150 amino acids. The largest known domain is 907 amino acids.

Large proteins are not generally made of larger domains, but rather from several normal sized domains. The largest number of domains known to be in a single protein is 13. Sometimes the polypeptide chain wanders from one domain back to another. The lac repressor has domains for DNA binding and for oligomerization. (Fig. 1-42) (Jmol)

In a 2 domain protein sometimes one domain is from the N-terminal part of the chain and the other domain is from the C-terminal part. In other proteins, the chhain starts in one domain, then goes to make a second domain, and finally returns to the first domain. (Fig. 1-42) (Jmol)

Domains always have a hydrophobic core. This tends to put a lower limit on the size of a domain. It must be large enough to have an interior. This provides stability via the hydrophobic effect.

Some proteins have two or more domains of almost identical structure. This is likely to be the result of duplication of a progenitor gene, followed by gene fusion. If this has occurred, one might be able to detect sequence similarity, if the duplication did not occur too long ago. Once duplicated, the 2 genes will evolve somewhat independently, apart from possible structural and functional constraints. Example is thioesterase from E. coli, which has 2 nearly identical domains, (Fig. 1-43). A related protein, thioester dehydrase, composed of 2 identical subunits, is also shown.

The eye lens protein gamma-crystallin provides an example of a protein with a gene duplication within a single domain, in addition to a duplicated domain. (Fig. 1-44) (Jmol)

The bovine gamma-B-crystallin sequence is 2BB2

Notice that starting at residue 8 the sequence is FEQENFQG skip over to the right and find YEQANCKG

This is the weak sequence similarity within the first domain.

Each domain has a protein fold. How many different protein folds are there?

The protein data bank has almost 30,000 strutures, mostly of proteins. Many of these are of a single protein reported many times, e.g. mutants and with different bound ligands. In addition, many different proteins have similar or virtually identical folds. So the total number of folds is likely to be very much smaller than the total number of proteins. Of course, a protein with many domains, may also have multiple folds.

Even though two proteins have similar folds, it does not follow that they must have similar amino acid sequences, or carry out similar functions. See tryptophan synthase and galactonate dehydratase. (Fig. 1-45) Each has 2 domains, one of which is an alpha/beta barrel. These enzymes appear to be unrelated in terms of amino acid sequences or function.

Many proteins in signal transduction pathways or cell-cycle control are composed of multiple domains, which are often built up in a modular way. (Fig. 1-46) Modular means that each can carry out its function, typically binding, or an enzyme reaction, as a single unit. In practice, several binding partners might act cooperatively.

Examples of two proteins with the same fold, but different functions: aldose reductase and phosphotiesterase both have the alpha/beta barrel fold. (Fig. 1-47) (Jmol).

What about enzymes that carry out similar functions. Do they have the same folds? Not necessarily. Aspartate amino transferase and D-amino acid aminotransferase do not, for example. (Fig. 1-48) (Jmol)

Multi-domain proteins are related to evolution. Fusion of genes that each code for one domain can lead to new proteins. It is simple if the genes are fused end to end. But if one gene is inserted into another, then the details are important. That is why one often sees domains that are connected at the loops of the domains. That can allow both domains to fold properly. An insertion into an alpha-helix, for example, or into any interior part of the protein would likely be disastrous.

Protein Fold Databases: There ae two well known databases of proteins that attempt to classify all domains according to a hierarchy of structures. Take a look at the links to see how they are organized.

See Cath and Scop

SCOP (I have linked the Berkeley mirror site) starts out with 11 categories, but the first 4 are the most important: all alpha, all beta, alpha/beta (alternating), and alpha + beta (distinct)

CATH starts out with 4 categories (Classes): mainly alpha, mainly beta, mixed alpha and beta, and irregular. Each is then broken down according to 5 levels: Class, Architecture, Topology, Homologous superfamily, and Sequence (See CATH)


Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG.
SCOP database in 2004: refinements integrate structure and sequence family data.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D226-9.

Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J, Orengo C.
The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis.
Nucleic Acids Res. 2005 Jan 1;33 Database Issue:D247-51.

Thornton JM, Orengo CA, Todd AE, Pearl FMG:
Proteins folds, functions and evolution.
J Mol Biol 1999, 293: 333–342. (Full text ) PubMed

In order to recognize and classify domains according to their polypeptide folds, it is helpful to see a level of organization that falls between secondary and tertiary structure. This is called supersecondary structure, and the elements are refered to as structural motifs. Structural motifs, such as beta-hairpins, need not be associated with any particular function. Some are, such as helix-turn-helix, which can be a DNA-binding element. There are also sequence motifs, which are often associated with some biochemical function, and are recognized by conseved amino acids. For example, zinc fingers can often be recognized by the sequence:

CXX(XX)CXXXXXXXXXXXXHXXXH

due to the geometry of using Cys and His as ligands for the Zn (Fig. 1-49).

This sequence motif is relatively easy to identify because it is contiguous.

A binding site for a substrate is likely to consist of discontiguous sequence elements, such as the catalytic site of chymotrypsin which includes: His 57, Asp 102, and Ser 195.

A different enzyme that carries out a similar reaction, subtilisin, has the same 3 residues, known as the catalytic triad, but they appear in a different order: Asp 32, His 64, and Ser 221. This indicates that the genes are not related. Or rather, it provides no evidence that the genes are related. If they are, some shuffling must have occurred. But since the secondary structure of the proteins are completely different, it seems very unlikely that the genes are related.

This is known as a case of convergent evolution: 2 unrelated genes have eventually placed the same 3 residues into an almost identical catalytic site. (Fig. 1-52) (Jmol)

Structural motifs include:

1. helix-loop-helix (DNA binding, Ca2+ binding)
     These are 2 alpha-helices separated by a turning region. In the limit it can be a       helical hairpin

2. beta hairpin
      These are 2 anti-parallel beta-strands separated by a short turn.

3. beta-alpha-beta
     This is an alternating patterrn of beta-strands and alpha-helices

4. Greek key beta
     This is 4-stranded beta-sheet, of anti-parallel strands in the order of 4-1-2-3


Classification of Folds according to elements of secondary structure:

All alpha, all beta, alpha/beta alternating, alpha + beta, and irregular or cross-linked domains.

All Alpha-Helix Proteins

All alpha-helical proteins can be built up from alpha-alpha elements. The results are commonly divided into the following 2 groups:

1. 4-helix bundle

2. Globin fold

The 4-helix bundle helices are typically 15-24 aa in length. (longer than an average helix)

The 4 helices are often antiparallel with their nearest neighbors. For example, 1 and 3 go up and 2 and 4 go down. This permits direct connectivity(myohemerythrin), but some proteins will have cross-over connections (human growth factor). Some proteins use multiple polypeptides (Rop and lac repressor) to form a 4-helix bundle.

Many unrelated proteins have this fold. There are many variations on it, e.g. crossovers connections, and extra helices.

Proteins with many different aa sequences, and many different functions.

Active sites are often found between the helices. Binding sites for larger molecules might be found on the surface.

Surface helices, such as many in 4-helix bundles, are amphipathic. The outer surfaces, facing the solvent water, are polar, while the inner surfaces, bearing hydrophobic residues, are nonpolar (e.g., Ala, Phe, Leu, Val, Ile).

Rop is a dimer of identical subunits of 63 aa that forms a 4-helix bundle. It binds RNA and is involved in the control of copy number of certain palsmids, including pBR322. (Jmol)

Other examples:

  1. Myohemerythrin (Jmol)
  2. Cytochrome b562 (Jmol)
  3. Hormones: leptin and human growth factor (Jmol)

Globin fold:

This is a much more complex topology, usually involving about 8 helices, with a variety of crossing angles. There are 2 main groups:

a) oxygen carriers: myoglobin (Jmol), hemoglobin, neuroglobin, and cytoglobin (Jmol)

b) phycocyanins: light gathering proteins (Jmol)

The 8 helices are labeled A-H and in myoglobin, range from 7 to 28 residues. Generally the helices interact with non-neighbors, except for G-H which form an anti-parallel helical hairpin.

The alpha-helix was proposed by Linus Pauling in 1950, and rules for helix-helix interactions were developed by Francis Crick in 1955. Later, this was taken up by Chothia et al in 1981.

For 2 helices to interact closely, or to pack well against each other, the "knobs" of one helix must fit into the "holes" of the other helix. This will tend to work best at particular crossing angles of the helices: 20˚ and 50˚. This is also refered to as "ridges into grooves"

The evolutionary relationship of globins and phycocyanins is very distant, as analyzed by Lesk et al. 1980 and 1990. The sedquence identity is about 15% , which is considered too low to be convincing, by itself, to establish a relationship. Also, even key residues, such as those that bind the heme are not conserved. Residues that mediate helix-helix contact are not conserved. What is conserved, that allows the same fold? Not even size, but rather hydrophobic character, which includes compensation: e.g. Ala-Leu pairs can be switched to Ile-Ala pairs. This also cause changes in helix-helix orientations.

Evolutionary relationship between the 2 types of globins:

Pastore A, Lesk AM
Comparison of the structures of globins and phycocyanins: evidence for evolutionary relationship.
Proteins 1990;8(2):133-55

Coiled-Coils

Alpha-helices have the ability to form long bundles of 2 helices that wrap around each other. These are called coiled-coils. They were first identified in the fibrous proteins called keratins, but were later "re-discovered" in transcription factors in the 1980's.

alpha-helix is a repeat of 3.6 residues per turn, and is right-handed.
A coiled-coil of 2 helices is usually left-handed, making the repeating surface 3.5 residues per turn. This allows a sequence repeat of exactly 7 residues (2 turns). Residues numbered 1 and 4 in the heptad (7) repeat are found at the interface of the helices.

By convention position 1 is a, position 2 is b, 3-c, 4-d, 5-e, 6-f, 7-g (Interface a, d)
Position a is most commonly : Leu, Ile, Ala, and Val
Position d is most commonly: Leu, Ala, (Ile, Val)

Other residues commonly included charged ones.

If the coiled-coil is right-handed then the repeat is 3.67, resulting in a helix repest of 11 residues (3 turns).

When the heptad repeat was "re-discovered" it was called the Leucine zipper. This is not quite right, since the interacting Leu residues interact between chains in a side by side manner, rather than above and below one another as the zipper metaphor implies.

Both parallel and anti-parallel varieties are found, and they appear quite similar. One difference is that in the

parallel form: residues a-a and d-d interact
anti-parallel: residues a-d and d-a interact.

Charged residues outside the interface can form ion pairs between chains. They can influence the stability of the coiled-coil, and their positions can influence whether parallel or anti-parallel coiled-coils form. In studies of model polypeptides, depending upon the residues at the a and d positions (e.g. more Val and Ile), trimers and tetramers can also form, as is sometimes seen in nature.

Crystal structure of a trimeric coiled-coil:

Nautiyal S, Alber T
Crystal structure of a designed, thermostable, heterotrimeric coiled coil.
Protein Sci 1999 Jan;8(1):84-90

Tropomyosin (Fig. 1-67)

GCN4 (Fig. 1-68) (Jmol)

Hetero-trimeric Coiled-coil (Jmol)

Reviews:

Lupas A
Coiled coils: new structures and new functions.
Trends Biochem Sci 1996 Oct;21(10):375-82

Lupas A
Predicting coiled-coil regions in proteins.
Curr Opin Struct Biol 1997 Jun;7(3):388-93

Websites for prediction of coiled coils:

"Coils" Try this sequence(AMEAKRKAEEHISSSHGDVDYAQASAELAKAIAQLRVIELTKKAM)

It returns this result:

by Lupas "Coiled Coil"

Prediction of Coiled-Coil Partners

John Walshaw, Derek N. Woolfson
SOCKET: A Program for Identifying and Analysing Coiled-coil Motifs Within Protein Structures ,Journal of Molecular Biology, Vol. 307, No. 5, April 2001, 1427-1450

Website for Sockets


Comments/questions: Email me

Copyright 2005, Steven B. Vik, Southern Methodist University

Last modified 2/15/05