18 April 2005

Biol 6312

Mechanisms of Regulation and Control

In metabolism there are two, sometimes distinct, situations that are often described by the terms regulation and control. Regulation is generally considered to refer to the process of responding to various environmental changes in order to maintain a constant level of a metabolite (homeostasis). In contrast, control is generally considered to refer to the process in which the level of a metabolite is increased or decreased in response to some signal.

The use of the terms regulation and control with respect to proteins may not be consistent with usage described above.

Regulation of proteins by degradation.

Proteins are degraded in cells by a variety of proteases, but primarily through the proteasome. This is a large structure that superficially resembles the HSP60 chaperonin. (Fig. 3-19) .

In general, the level of any protein can be maintained by a balance between its synthesis and its degradation. The rate of degradation might be related to the intrinsic stability of the protein.

In addition, proteins might be targeted for degradation, in response to their own damage, or some outside signal. The primary signal that sends a protein to the proteasome for degradation is the covalent tagging by many molecules of ubiquitin. Ubiquitin is covalently linked to a lysine residue near the N-terminus of a protein. This is catalyzed by a ubiquitin ligase, a part of a larger pathway. Proteolysis by the proteasome is driven by ATP hydrolysis. (Fig. 3-20) . Ubiquitin is not degraded, but is recycled.

Phosphorylation of specific residues can also be a signal to send a protein for degradation. For example, NFκB, a transcription factor, is retained in the cytoplasm by binding to another protein IκB. The phosphorylation of 2 serine residues in IκB signals it to be ubiquinated, and eventually degraded. This releases NFκB, allowing it to travel to the nucleus and be active.

Hypoxia-inducible factor (HIF) is another transcription factor. Under normal levels of oxygen, its prolines are hydroxylated, leading to ubiquination and degradation. When oxygen levels fall sufficiently, it is no longer hydroxylated, and so the factor survives to function.

Control of protein function by post-translational modification

More than half of all human proteins are thought to be post-translationally modified. More than 40 different types of modifications have been observed. Common ones are phosphorylation, glycosylation, lipidation, limited proteolysis, and less common ones include methylation, N-acetylation, nitrosylation and attachment of SUMO.

Common effects of covalent modification include changing the location of a protein, its activity, or its interactions with other proteins. Limited proteolysis can trigger a cascade of activity by activating enzymes, such as in the blood clotting system.

Phosphorylation is the most common post-translational modification. Unlike limited proteolysis, it is reversible, making it suitable for rapidly turning enzyme activity on and off.

Phosphorylation is a switch mechanism

Phosphorylation of proteins occurs in all types of living organsims. Target proteins are phosphorylated by protein kinases and are dephosphorylated by protein phosphatases. This allows for independent regulation of the reaction in both directions. Nucleoside triphosphates are the source of the phosphoryl group, usually ATP.

In the human genome serine, threonine and tyrosine kinases constitute about 2% of the genome (575 instances)-the third most common domain. Bacteria tend to have histidine and aspartate kinases, and in E. coli they make up about 1.5% of the genome.

The covalent attachment of a phosphate group can have large effects on the conformation of the protein. It introduces negative charge and H-bonding capacity.

Phosphorylation can affect the activity of the target protein. Example glycogen phosphorylase undergoes a conformational change after phosphorylation of a single Serine by phosphorylase kinase. It then becomes more active in the production of glucose-6-phosphate from glycogen. (Fig. 3-22)

Isocitrate dehydrogenase of E. coli is inactivated by phosphorylation of a serine residue at the active site. This does not involve a significant conformational change. (Fig. 3-23)

Phosphorylation can also create a new binding surface for another protein. For example SH2 domains bind to phosphorylated tyrosines. (Also see Fig. 3-21)

Protein kinases are generally controlled by phosphorylation. Many kinases resemble the structure shown in Fig. 3-24. There are two domains, or lobes, with a hinge-like connection between them. A catalytic cleft exists between the 2 domains. A flexible activation loop controls the activity state of the kinase. (Fig. 3-25)

Src-family kinases

Src kinases are activated by phosphorylation of a Tyrosine residue near the activation loop. The phosphotyrosine stabilizes the active conformation. (Fig. 3-26) Src kinases can autophosphorylate, leading to rapid conversion to the active sate.

Src kinases contain 2 domains that help to keep the kinase inactive. The SH2 domain binds to a phosphotyrosine in the kinase, and the SH3 domain binds to a polyproline helix segment.

Cyclin-dependent kinases

These are enzymes that control the timing of the cell cycle. Cyclin-dependent kinase 2 (Cdk2) is activated by the binding of cyclin A and the phosphorylation of Threonine 160. Neither alone is sufficient. (Fig. 3-27) The isolated Cdk2 is inhibited by the location of the red helix, PSTAIRE, in which the catalytic Glutamate is not in position for hydrolysis of ATP. The binding of cyclin A helps form an active catalytic site by moving the red helix, and causing the short green helix to change into a β-strand. The phosphorylation of Thr 160 helps the kinase to better interact with its substrates.

Signaling systems in bacteria

Bacteria use a two-componet signaling system that is somewhat different from the kinase cascade system in eukaryotes. The first component is a membrane-bound ATP-dependent histidine kinase receptor protein (HK). The second component is a cytoplasmic response regulator protein (RR). Signals from outside the cell can activate the kinase domain of the receptor. First autophosphorylation of a histidne residue in the HK domain. Then the phosphoryl group is transfered to an aspartate residue in an RR protein. (Fig. 3-29)

RR regulatory domains have 3 distinct functions:

  1. They catalyze their own phosphorylation
  2. They catalyze their own dephosphorylation by a phosphatase activity
  3. They regulate the activity of their effector domains

The effects of phosphorylation of the RR protein tend to be spread over a surface. Numerous residues undergo small conformational changes (magenta-phosphorylated) (Fig. 3-30)

Although many stimuli lead to changes in rates of transcription, a well-characterized two-component system signals bacterial chemotaxis in response to nutrients.

Control by proteolysis

Limited proteolysis is a mechanism used to activate some proteins. It is distinct from proteolysis for degradation.

After proteolysis, the fragments can remain together if they are linked by disulfide bonds. This is generally the case during the activation of proteolytic enzymes such as chymotrypsin. The key proteolytic step is the hydrolysis of the bond between Arg15 and Ile 16. When this is cleaved, the enzyme can rearrange its conformation to make an active enzyme. (Fig. 3-31) Illustrated for plasmin (blue-active)/plasminogen(red-inactive) (Fig. 3-32)

The processing of hormones leads to a variety of different products with distinct functions. The precursor to pituitary hormones is called Prepro-opiomelanocortin. Some steps are tissue-dependent. (Fig. 3-33)

The blood coagulation cascade is an example of a proteolytic cascade, in which a single activated protein can rapidly lead to millions of activated molecules downstream, such as would be needed to clot the flow of blood from a wound. (Fig. 3-34)

Protein splicing

There are about 100 examples of protein splicing known today. Typically, a single polypeptide chain is processed into two new chains, similar to the way in which introns are removed from mRNA. Each of the new chains can be a functional protein. (Fig. 3-35). Many inteins are nucleases. (Fig. 3-36) The mechanism of protein splicing requires no other factors. There are 3 key residues or regions, designated A at the N-terminal end of the intein, G at the C-terminal end of the intein, and B internal to the intein. Residue A usually contains an Oxygen (Ser, Thr) or sulfur (Cys) for nucleophilic attack. B is usually TXXH, and is important for intein structure and function. G is usually Asn, which is important in the mechanism, shown in Fig. 3-38.

In a related process, the eukaryotic Hedgehog protein autocatalyzes the cleavage of its C-terminal domain, and links a cholesterol to the new N-terminus. This binds the protein to a membrane.

Prediction of Tertiary Structure

In general there are three approaches

a) Try to model an amino acid sequence by homology (homology modeling)

b) Try to find compatibility to known structures (threading)

c) Try to fold an amino acid sequence based on physical principles (ab initio or de novo)

 Identification of topological fold is often the goal.

Sometimes these approaches are combined

A) Modeling Approach

Look for sequences that have >30% sequence identity with a protein of known structure

(Sequences of 15-30% identity can be attempted)

Basic principles:

1) Buried amino acid residues are hydrophobic

2) Surface amino acid residues are polar

3) Within a family of homologous proteins, buried and active site residues are conserved.

4) Within a family of homologous proteins, surface residues are variable.

5) Elements of secondary structure will be more highly conserved than amino acid sequence.

3 steps in the procedure

1) Sequence alignment

2) Build sequence into secondary structure

3) Energy minimize to improve tertiary structure

Swiss-Model: The EXPASY server will build a model of a protein that is at least 30% identical to a protein in the protein data bank. Click the "First Approach Mode" at the left

B) Threading

If no homologous protein can be identified by sequence comparisons, the compatibility of the a.a. sequence of the target can be determined for representations of all known folds (templates).

This is called threading. (Fig. 4-25)

Example:

J Mol Biol 1997 Apr 11;267(4):1026-38 
A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence.
Rice DW, Eisenberg D

They have derived a scoring matrix from a database of 119 pairs of proteins of known structure with the same fold, but with <30% sequence identity.

882 elements = 7 x 3 x 2 x 7 x 3

7 classes of amino acids (Cys; Trp; Arg,Lys; Tyr,Phe; Ile,Leu,Val,Met; Ala,Gly,Ser,Thr,Pro; Asp,Glu,Asn,Gln,His)

3 types of secondary structure (helix, sheet, turn)

2 locations (buried, exposed)

and in the target sequence 7 classes of a.a. and 3 types of secondary structure (from PredictProtein)

First: obtain secondary structure prediction from PredictProtein.

Second, Calculate score for each of the 119 folds:

Example:

The highest score is for a Trp, predicted to be in helix, that matches a buried Trp in a helix____Score=4.5

A basic residue predicted to be in a sheet that matches an exposed basic residue in a sheet___Score=2.3

The same basic residue that matches an exposed basic residue in a helix would score -9 (Lowest score)

Threading by PsiPred using Genthreader

Jones DT:
GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences.
J Mol Biol 1999, 287: 797–815.

Prospector server by Jeffrey Skolnick

2. Ab initio approach (physical principles)

This approach can work even in the absence of homology to known structures, but overall the reliability is low.

LINUS is one example: Local Independently Nucleated Units of Structure

50 amino acids are folded at a time, in an overlapping fashion: 1-50, 26-75, ...

It is based on the idea that actual proteins fold by forming local secondary structure first.

Side chains are simplified. Only 3 interactions are used:

1 repulsive: steric

2 attractive: H-bonds and hydrophobic

Then the calculation of all possibilities for the search of the lowest free energy

Proteins 1995 Jun;22(2):81-99
LINUS: a hierarchic procedure to predict the fold of a protein.
Srinivasan R, Rose GD

Proc. Natl. Acad. Sci. USA Vol. 96, Issue 25, 1425814263, December 7, 1999 (Full text)
A physical basis for protein secondary structure
Rajgopal Srinivasan and George D. Rose

Srinivasan R, Rose GD.
Ab initio prediction of protein structure using LINUS.
Proteins. 2002 Jun 1;47(4):489-95.

Example prediction by LINUS

Touchstone II : no server is available
Zhang Y, Kolinski A, Skolnick J.
TOUCHSTONE II: a new approach to ab initio protein structure prediction.
Biophys J. 2003 Aug;85(2):1145-64.

De novo prediction: Rosetta by David Baker

Bonneau R, Strauss CE, Rohl CA, Chivian D, Bradley P, Malmstrom L, Robertson T, Baker D.
De novo prediction of three-dimensional structures for major protein families.
J Mol Biol. 2002 Sep 6;322(1):65-78.

Fully automated 3-d structure prediction: Examples Fig. 4-26, Fig. 4-27

Bystroff C, Shao Y.
Fully automated ab initio protein structure prediction using I-SITES, HMMSTR and ROSETTA.
Bioinformatics. 2002 Jul;18 Suppl 1:S54-61
Server prediction:

Robetta here

Kim DE, Chivian D, Baker D.
Protein structure prediction and analysis using the Robetta server.
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W526-31.

Robetta = Robot Rosetta

These sites offer www access to predictions of tertiary structure:

3D-PSSM Imperial Cancer Research Fund, London

Fugue University of Cambridge

SAM-T02 UC Santa Cruz

CAFASP2: Critical Assessment of Fully Automated Structure Prediction (CAFASP3) (CAFASP4)

Meta Prediction Server

CASP4 analysis:
Samudrala R, Levitt M.
A comprehensive analysis of 40 blind protein structure predictions.
BMC Struct Biol. 2002 Aug 1;2(1):3.

Fischer D, Barret C, Bryson K, Elofsson A, Godzik A, Jones D, Karplus KJ, Kelley LA, MacCallum RM, Pawowski K et al.: CAFASP-1: critical assessment of fully automated structure prediction methods.
Proteins 1999, S3: 209–217


Comments/questions: svik@mail.smu.edu

Copyright 2005, Steven B. Vik, Southern Methodist University

Last modified 4/18/05