11 April 2005

Biol 6312

Mechanisms of Regulation

  1. Proteins can be targeted to specific compartments of cells. (signal sequences or lipid modifications)
  2. Proteins can be regulated by effector binding or covalent modification. (allosteric inhibitors or phosphorylation)
  3. Protein activity can be regulated by expression level and rate of degradation. (at the level of RNA or protein)

Protein Interaction Domains

The interactions of many proteins are mediated by modular interaction domains. These domains are typically 35-150 amino acid residues in length. The N- and C- termini are usually close in space, allowing them to be accommodated as insertions into surface regions of almost any other fold. They can be categorized according to sequence, structure or ligand-binding properties.

(Fig. 3-2)

Effector Ligands

Ligands that bring about effects on proteins through their binding can range in size from protons to proteins. Proton binding is usually considered a special class, and is discussed under the topic of pH effects. Protein binding is another special class discussed above. In between in size, are the so-called small molecules. They are often metabolites.

Competitive binding

This is a common situation, in which a product serves as an inhibitor of a reaction by binding at the catalytic site, i.e., it competes with the substrate. Almost all products can compete with the substrates of the the enzyme that produced them. Some also inhibit earlier enzymes in a metabolic pathway. That is called feedback inhibition. (Fig. 3-7)

Cooperativity

Cooperativity in binding of a ligand can occur if there are multiple subunits of the protein. The binding of one ligand by the protein influences the binding of subsequent ligands. In positive cooperativity, the subsequent ligands bind with enhanced affinity. In negative cooperativity, the subsequent ligands bind with diminished affinity. An extreme form of negative cooperativity is called "half-of-the-sites". In that case, binding of one ligand prevents binding of the second. In general, cooperativity is a consequence of the inherent flexibility of proteins, resulting in altered binding sites. (Fig. 3-8)

Allostery

Allostery is the situation when the effector ligand is different from the functional ligand. In this case the binding of the effector ligand at one site influences the affinity of the functional ligand at a different site. Allostery occurs because the binding of an effector ligand changes the conformation of the protein. This can occur through a sequence of conformational states (Fig. 3-9a) or through an equilibirium beween two symmetrical states (Fig. 3-9b). An effector molecule that increases affinity for a functional ligannd is called an allosteric activator. On that decreases affinity for a functional ligand is called an allosteric inhibitor.

Aspartate transcarbamoylase is an example of an allosteric enzyme. It has 6 catalytic subunits arranged in 2 trimers, and 6 regulatory subunits arranged in 3 dimers. There are 2 conformatonal states: a low-activity T state, in which regulatory subunits interact with the catalytic sites to decrease activity, and a high-activity R state in which the subunits have moved apart. (Fig. 3-10). CTP is an allosteric inhibitor. It is also a feedback inhibitor, since this enzyme catalyzes an early, but committed, step in pyrimidine synthesis. Its binding stabilizes the T state. ATP is an allosteric activator. This allows pyrimidine synthesis to catch up with purine synthesis. ATP stabilizes the R state of the enzyme. Both allosteric effectors and substrates bind cooperatively.

This enzyme illustrates the important point that an enzyme can be inhibited by the binding of a molecule at a site distant from the active site. Such effects can often be mimicked by mutations. Therefore, mutations that eliminate function should not be assumed to occur onl at active sites.

Small molecules or ions that bind to activators or repressors of gene expression are known as co-activators or co-repressors. They can be thought of as allosteric effectors also. For example, Fe2+ binding is necessary for the diphtheria toxin repressor to dimerize properly to bind to the major groove of DNA. (Fig. 3-11)

Protein switches based on nucleotide hydrolysis

Many processes in cells are turned on or off by proteins that acts as molecular switches. The most common element in such proteins is that they undergo a conformational change based on the binding of nucleoside-diphosphates or nucleoside-triphosphates. Often the nucleotide used is guanosine triphopshate (GTP), and its hydrolysis product GDP. They have been termed G proteins. A second major class includes motor proteins that use ATP. These 2 classes have different folds, but bind the nucleotides in a similar way. (Fig. 3-12) This includes the "P-loop", which wraps around the phosphate region of the nucleotide, and the 2 "switch" regions, which undergo the significant changes in conformation. A schematic view of a G protein (Fig. 3-13)

The GTP-bound form of the protein is in the "on" state. It remains activated until the GTP is hydrolyzed, and the phosphate leaves. This triggers the conformational change to the GDP-bound form, which is the "off" state.

Signaling by small GTPases

An example of a monomeric, small GTPase protein is ras, part of a signal transduction pathway that is commonly found to be mutated to the "on" state in cancer.

The transition from the "off" state to the "on" state is facilitated by additional proteins called guanine-nucleotide exchange factors (GEFs). They bind to the GDP-bound form of the G-protein and cause release of the GDP. GTP will rapidly bind, causing a net exchange of GTP for GDP. (Fig. 3-14)

There might be several different GEFs that activate a single G-protein.

The transition from the "on" state to the "off" state depends upon the rate of GTP hydrolysis. In G-proteins, this rate is typically slow. That causes the protein to stay in the activated state for a long time. The rate can be increased by GTPase-activating proteins (GAPs). Again, several GAPs might activate a single G-protein. Some GAPs activate the GTPase by contributing residues to the active site, and contribute to transition-state stabilization. One example in the case of ras is the "arginine-finger", which helps to stabilize the negative charge that develops in the transition state.

Signaling by heterotrimeric GTPases

The other common class of G-proteins are heterotimeric: an α subunit that resemble the ras protein, and 2 tightly interacting proteins β and γ. They function with G-protein coupled receptors (GPCRs) at the cytoplasmic surface of the plasma membrane of eukaryotes. GCPRs are membrane proteins with 7 transmembrane spans, such as rhodopsin (Fig. 3-15)

Reaction cycle of heterotrimeric G-proteins:

The heterotrimer is bound to the surface of the plasma membrane in the GDP-bound or "off" state. This involves covalent prenylation of the α subunit. When activated, for example by absorbing a photon in the case of rhodopsin, the GCPR serves as a GEF to casue release of the GDP, and binding of GTP to the α subunit. This step requires that β and γ subunits be present.

After nucleotide exchange, the heterotrimer dissociates from the GCPR, and β and γ dissociate from α. They go on to take part in other signaling pathways. After GTP hydrolysis, the α subunit can rebind β and γ and then rebind to the GCPR.

The rate of GTP hydrolysis by heterotrimeric GTPases is stimulated by proteins called regulators of G-protein signaling (RGPS) proteins Rather than contributing an arginine to the α subunit (which already has one), these proteins generally bind to the switch regions.

GTPases in protein synthesis

Elongation factors that deliver charged-tRNAs to the ribosome for protein synthesis are GTPases. In bacteria they are called EF-Tu and in eukaryotes EF-1. They contain 3 domains, one a GTPase, and 2 that bind the tRNAs. The ribosome functions as a GAP. If the tRNA has delivered the correct amino acid, then the fit between the tRNA and the mRNA will be proper, and this will lead to GTP hydrolysis by the EF-Tu, stimulated by the ribosome. The GDP-bound EF-Tu then needs a guanine nucleotide exchange factor, which is called EF-Ts.

If the anti-codons do not match the codons, then the ribosome will not stimulate GTP hydrolysis, and the EF-Tu-aa-tRNA complex will quickly dissociate as a unit. (Fig. 3-16)

Motor protein switches

Motor proteins are used for intracellular transport of macromolecule, organelles, vesicles, and chromosomes. They are used for chemotaxis and muscle movement. These proteins use ATP hydrolysis to generae conformational states. In general, there are motor proteins, such as mysoin, kinesin, and dynein, which use ATP hysrolysis to move along filaments. They hhave multiple connformational staes invovling ATP binding, ADP binding and covalently linked phosphate. (Fig. 3-17) They have switch regions, much like the G-proteins. (Fig. 3-18)

Recent progres in kinesin is discussed in the review below. The likely mode of kinesin movement is depicted in the movie below.

Yildiz A, Selvin PR.
Kinesin: walking, crawling or sliding along?
Trends Cell Biol. 2005 Feb;15(2):112-20.

Vale kinesin movie

Prediction of Protein Structure

Overview

Key References over the past few years:

Ginalski K, Grishin NV, Godzik A, Rychlewski L.
Practical lessons from protein structure prediction.
Nucleic Acids Res. 2005 Apr 1;33(6):1874-1891

Schonbrun J, Wedemeyer WJ, Baker D.
Protein structure prediction in 2002.
Curr Opin Struct Biol. 2002 Jun;12(3):348-54.

Protein structure prediction in the postgenomic era
David T Jones
Current Opinion in Structural Biology 2000, 10:371-379.

There continues to be a large gap between the number of proteins of known amino acid sequence and the number of proteins of known 3-D structure. This gap may ever be eliminated.

But protein structure is essential for understanding the function of the protein:

  1. mechanism of protein folding
  2. mechanism of enzyme catalysis
  3. analysis of stability
  4. interactions with other molecules
    1. proteins
    2. ligands
    3. substrates
    4. inhibitors

Can the prediction of protein structure from sequence be improved enough to eliminate the need to crystallize each protein?

Probably not, in the near future, but predictions can generate useful and generally realiable information

There are 3 levels of analysis in the overall prediction scheme;

  1. Motif recognition in the primary sequence
  2. Secondary structure prediction
  3. Teriary structure/fold prediction

A starting point is often to search for proteins with sequences that are similar to the protein under study. This usually involves a BLAST search:

BLAST (Basic Local Alignment Search Tool)

Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ:
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic Acids Res 1997, 25: 3389–3402.

Altschul SF, Koonin EV.
Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases.
Trends Biochem Sci. 1998 Nov;23(11):444-7. Review.

From this analysis one might learn about the function of similar proteins, and whether a 3-D structure exists for any of them. Depending on the degree of amino acid identity of the closest "neighbors", one might surmise the function of the protein.

Example BLAST input page

Protein Motif Recognition

Prosite is a database of sequence motifs for functions such as:

post-translational modifications of N- or C-termini, signal sequences for localization, sites of lipid attachment, sites of phosphorylation, or markers of particular types of enzymes

Example: phosphorylase kinase   KRKQISVR   Reference: Kemp & Pearson (1990) TiBS 15:342-346

PROSITE Website

To scan a protein sequence against the Protsite database go here

There are databases of sequence motifs. These can generate aligned sequences from their websites.

PRINTS  a Protein Fingerprint database

BLOCKS

ProDom

Prediction of Secondary Structure

In some cases this is preliminary to prediction of tertiary structure.

a) Stastical methods, first developed by Peter Chou and Gerald Fasman

Based on the tendencies of particular amino acid to be found in the different types of secondary structure. They are considered to be about 60% accurate.

Server for Chou-Fasman Prediction

Original references are too early for on-line abstracts etc.

Chou PY, Fasman GD.
Empirical predictions of protein conformation.
Annu Rev Biochem. 1978;47:251-76. Review.

Chou PY, Fasman GD.
Prediction of protein conformation.
Biochemistry. 1974 Jan 15;13(2):222-45.

Chou PY, Fasman GD.
Conformational parameters for amino acids in helical, β-sheet, and random coil regions calculated from proteins.
Biochemistry. 1974 Jan 15;13(2):211-22.

Garnier (GOR1) Predicter of secondary structure

Garnier J, Osguthorpe DJ, Robson B.
Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins.
J Mol Biol. 1978 Mar 25;120(1):97-120.

Garnier J, Gibrat JF, Robson B.
GOR method for predicting protein secondary structure from amino acid sequence.
Methods Enzymol. 1996;266:540-53.

Combination approach:

Kloczkowski A, Ting KL, Jernigan RL, Garnier J.
Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence.
Proteins. 2002 Nov 1;49(2):154-66.

b) Neural net predictions of Secondary Structure:

They use a training set of known proteins, and usually utilize multiple sequence alignments during the prediction. They are considered to be 70-80% accurate.

PredictProtein (Now at Columbia U.)

Rost B, Schneider R, Sander C.
Progress in protein structure prediction?
Trends Biochem Sci. 1993 Apr;18(4):120-3.

PsiPred David T. Jones

Jones DT:
Protein secondary structure prediction based on position-specific scoring matrices.
J Mol Biol 1999, 292: 195–202. Full text MEDLINE

Membrane-spanning segments of integral membrane proteins can be predicted in similar ways. We will discuss this later.

Prediction of Secondary Structure

A. Statistical methods for the prediction of Secondary Structure

Chou-Fasman Method

the tendencies of each type of amino acid to be found in each of the 3 types of secondary structure (helix, sheet, loop) are calculated from a database of high-resolution structures (2 Å):

Original database, 1974 contained 15 proteins (2473 amino acids)

Revised, 1978, containing 29 proteins (4741 amino acids)

Simply increasing the database did not increase the accuracy of the predictions.

is the propensity for Ala to be found in an & ;-helix

      Where the i's correspond to the amino acid type (20) and j's are the secondary structures (3)

e.g.:

  = the number of Ala residues found in α-helix (in the database)

  = the number of Ala residues (in the database)

  = the number of amino acids found in α-helix (in the database)

  = the number of amino acid residues in the database

So, the top of the propensity expression represents the fraction of all Ala residues that are α-helical,

while the bottom represents the fraction of ALL residues that are α-helical. Therefore, this ratio represents the tendency of each amino amino relative to the average amino acid.

A propensity of >1 indicates more likely than chance, while <1 indicates less likely than chance. A value of 1.0 indicates no prediction.

Since secondary structure is usually formed by several consecutive residues, it is more meaningful to take running averages of 5 or more amino acids at a time. This is called the window. A prediction will tend to be most accurate when the window matches the size of the actual segment of secondary structure.

The entire length of the protein is analyzed for each set of secondary structure propensities (helix, sheet, turn). The final prediction is made by comparing the 3 sets of values.

These calculations can be done by a spreadsheet using the "LOOKUP" function (called in Excel). This can also be used for predicting transmembrane spans using hydrophobicity values.

Special applications:

1) Regions that are highly likely to be both α helical or β-sheet are candidates for conformational changes.

2) The effects of mutations can be predicted by changing the sequence.

3) Turns can be predicted (with fair accuracy) on a residue basis, because they are not extended structures

Limitations: Accuracy of this method seems to be limited because of the limited range of propensities. Most types of amino acids can be found often in any secondary structure. Few are really excluded from any of the secondary structures.

Extension of this approach. Position specific propensities, e.g. first position in a helix or turn. This works well with turns or with the termini of helices/sheets.

Protein Sci 1994 Dec;3(12):2207-16
A revised set of potentials for β-turn formation in proteins.
Hutchinson EG, Thornton JM

B. Neural net predictions

Turns can be predicted this way:

Protein Sci 1999 May;8(5):1045-55
Prediction of the location and type of β-turns in proteins using neural networks.
Shepherd AJ, Gorse D, Thornton JM

In general these predictions are obtained from servers through the world wide web.

PredictProtein from EMBL or Columbia University

PredictProtein (Now at Columbia U.)

Rost B, Schneider R, Sander C.
Progress in protein structure prediction?
Trends Biochem Sci. 1993 Apr;18(4):120-3.


Comments/questions: svik@mail.smu.edu

Copyright 2005, Steven B. Vik, Southern Methodist University

Last modified 4/11/05