www.zampbioworld.org
Biochemistry
Properties of Living Systems, Biomolecules, Biomolecular Hierarchy
Properties of biomolecules & fitness, Organization and structure of cells, Viruses
Water, pH and Ionic Equilibria
Basic Thermodynamic Concepts, Physical Significance of Thermodynamic Properties
Effect of Concentration on Net Free Energy Changes, High-energy Biomolecules, Complex Equilibria Involved in ATP Hydrolysis
Amino acids: Building Blocks of Proteins, Acid-Base Chemistry of Amino Acids, Reactions of Amino Acids
Optical activity & stereochemistry of amino acids, Spectroscopic properties of amino acids, Separation and analysis of amino acids
Proteins are linear polymers of amino acids, Architecture of protein molecules, The many biological functions of proteins
Chemical groups in proteins, Purification of proteins, Amino acid sequencing
Forces influencing protein structure, Primary and secondary structure
Protein Folding and Tertiary Structure, Subunit Interactions and Quaternary Structure
Carbohydrates
Lipids

Chemical groups in proteins, Purification of proteins, Amino acid sequencing

Chemical Groups in Proteins

  • Simple proteins: have only amino acids in them
  • Conjugated proteins: have other types of molecules as an integral part of their structure - often covalently bonded through linkage(s) either with the main chain or side chain atoms
    • The Prosthetic group: is the non-protein group that is found in a conjugated protein

 

The following is a list of commonly observed prosthetic groups in conjugated proteins

  • Glycoproteins. These have carbohydrate, typically covalently bonded to either Ser, Thr or Asn side chains. Often confers added solubility, serves to regulate the in vivo half-life, and is a commonly observed modification in extracellular and membrane-bound proteins. Can also function in cell-cell communication and identity.
  • Lipoproteins. These have lipid prosthetic groups. Lipoproteins may serve a transport function for lipids, and the lipids are therefore typically bound via non-covalent interactions. Can also serve to anchor soluble proteins to a lipid surface or membrane
  • Nucleoproteins. Complexes (often non-covalent) of nucleic acids and proteins. Often involved in the packaging or regulation of the nucleic acid (i.e. genetic material)
  • Phosphoproteins. These are proteins with a phosphate covalently bound to the side chain hydroxyl groups of either
  • serine, threonine or tyrosine. Phosphorylation often regulates the functional activity of the protein to which it is bound (i.e. turning it on or off depending on the state of phosphorylation). This regulation often is part of a signal transduction pathway from outside to inside the cell.
  • Metalloproteins. Contain a metal ion bound via non-covalent electrostatic interactions. The metal ion is often required for function or stability of the metalloprotein (e.g. often the metalloprotein is a metal-requiring enzyme), or the protein is a transport molecule for the metal.
    • Hemoproteins. An important subclass of metalloproteins that contain a
    • Fe (Iron) prosthetic group. Many important redox and energy pathways involve hemoproteins.
  • Flavoproteins. Flavin is an important prosthetic group for a variety of
  • redox reactions. As a functional group it has the ability to reversibly bind/release electrons

 

Purification of Protein Mixtures

Selective purification of a particular protein from a mixture of different proteins takes advantage of the unique physical and chemical properties of the protein of interest.

  • A "purification scheme" is a series of fractionation steps using methodologies that separate mixtures of proteins based upon some physical or chemical property

What common properties distinguish different proteins?

  • Molecular mass. Proteins have a mass that is proportional to the length of the polypeptide chain. The average molecular mass of amino acids is approximately 118 Da, or 118 g/mole. However, a water (18 Da) is released when a peptide bond is formed. So a useful heuristic (rule of thumb) is that the average mass of an amino acid in a polypeptide is 110 Da. Therefore, a protein with 150 amino acids would have a molecular mass of approximately 16,500 Da (16.5 kDa), or 16,500 g/mole. Various analytical methods will separate molecules based on molecular mass (for example, a method known as gel-filtration chromatography).
  • Solubility. Different proteins have different solubilities in certain salt solutions. This can be used to selectively fractionate a protein of interest.
  • pKa. Different proteins have different pKa values. Thus, at a certain pH different proteins will have a different net charge and this electrostatic property can be used to separate proteins (in a method known as ion-exchange chromatography)
  • Hydrophobicity. Different proteins will have different proportions of aliphatic and aromatic side chains on their surface. These groups contribute to the hydrophobicity of the protein. This property can be used to separate such proteins in a method known as hydrophobicity interaction chromatography.
  • Affinity for specific ligands. The functionality of a particular protein may require the binding of a prosthetic group. This property can be exploited to separate such proteins in a method known as affinity chromatography.

An important key point about fractionation steps is that you need to be able to know which fraction contains your protein of interest

  • An "assay" is a analytical methodology used to identify a component of interest (i.e. the protein you want to purify)

Before you can purify something, you must have an assay

  • Ideally, an assay should be:
    • Specific. It detects only your protein of interest, and does not give any "false positives" or "false negatives"
    • Sensitive. Often the process of assaying can be destructive to the sample. Therefore, you don't want to use up all your sample when you assay it.
    • Rapid. You don't want to wait weeks for the result
    • Quantitative. You don't want a simple "yes/no" answer in an assay. In order to be able to determine yield and purity you must have a quantitative assay.
  • Yield
  • tells you how much protein of interest is recovered after each fractionation step. Combined losses can rapidly deplete your protein. Overall yield is the product of the yield for each fractionation step.
  • Purity
  • should increase with each fractionation step. The hallmark of a pure protein is that the purity does not increase no matter what additional purification steps are taken.

Determination of the amino acid sequence of a protein

Proteins are chemically well-defined. All the molecules of a given sample of pure protein have an identical primary sequence.

The analytical chemistry for determination of the sequence of amino acids in a polypeptide has been worked out, and to a large degree, automated.

It should be noted that proteins can also be sequenced by sequencing the DNA that codes for them (and subsequently translating the DNA sequence into protein sequence). Many proteins are actually sequenced in this way. But direct protein sequencing is still important in many cases.

  • Starting at the amino terminal, the amino acids in a polypeptide can be sequentially identified using a method known as Edman degradation. The practical limit on the "amino terminal sequencing" is about 25-35 "cycles" (i.e. amino acids). The following is a description of the so-called "Edman chemistry" associated with N-terminal peptide sequencing:

    • Note that with Edman chemistry only the N-terminal residue is attacked and removed, the rest of the polypeptide remains intact after the reaction.
    • The new amino terminal group (previously the second amino acid in the polypeptide chain) is now available for another round of reactions. Thus, the method can be automated.
    • The amino acid side chain of the phenylthiohydantoin derivative can be identified using liquid chromatography. Modern amino acid sequencers can probably sequence on the order of two to three dozen cycles (amino acids) of a polypeptide.
    • Note that the reaction requires a free amino group on the N-terminal of the protein
    • . If the amino-terminal residue is methylated or formylated then the reaction will not proceed (and the polypeptide is said to have a "blocked" N-terminal).

 

  • Starting at the carboxyl terminal, the amino acids can be sequentially identified using enzymes called carboxy-peptidases, that sequentially hydrolyze amino acids at the carboxy terminal. This is not well automated (often done by hand), and can typically identify only 5-6 amino acids at the carboxy terminal
  • Therefore, only short peptides of ~30 amino acids can be directly sequenced in their entirety.
  • Most proteins require some type of chemical fragmentation to be able to sequence the entire polypeptide sequence.

Peptide Mapping

How can sequence information for the entire polypeptide be obtained?

  • One method is that of peptide mapping. Peptide mapping makes use of proteolytic cleavages of the polypeptide to produce smaller polypeptides. These smaller polypeptides can then be isolated from one another and subject to sequence analysis.
  • However, now we have another problem. The product of such proteolytic cleavages are peptide fragments, and although we might be able to separate and sequence the individual peptides, we have no idea what order they are supposed to be in:

How do we order the different sequences that we obtain?

  • One of the easiest ways is to repeat the experiment, but with a protease with a different specificity, and in this way obtain overlapping sequence information.

Name

Source

Specificity

Chymotrypsin

Bovine Pancreas

Cleavage after Tyr, Phe and Trp; some cleavage after Leu, Met and Ala

Bromelain

Pineapple

Cleavage after Lys, Ala and Tyr

Trypsin

Bovine Pancreas

Cleavage after Arg, less after Lys

V8 protease

Staphylococcus aureus

Cleavage after Glu, less after Asp

 

Overlapping sequence information can allow you to align the peptides in the correct order and determine the sequence of the original large polypeptide (i.e. protein).

 

  • Another complication to direct protein sequencing is the effect of disulfide bonds and multiple polypeptide chains in the tertiary structure of a protein:

A single polypeptide will give an unambiguous amino terminal sequence:

Cycle

1

2

3

4

Amino acid

Alanine

Phenylalanine

Asparagine

Lysine

However, a disulfide-linked pair of polypeptides will give an ambiguous sequence:

Cycle

1

2

3

4

Amino acid(s)

Alanine, Asparagine

Proline, Phenylalanine

Aspartic acid, Asparagine

Lysine, Methionine

 

One of the first steps in protein sequencing is to therefore reduce any disulfide bonds and to separate individual polypeptide chains.


Other considerations prior to amino or carboxyl terminal sequencing

  • The amino acid composition may be determined using acid hydrolysis. However, this can use up valuable material that could be better put to use in sequencing
  • Various methodologies can be used to identify whether the protein is a conjugated protein. If prosthetic groups are covalently bound to side chains, this can interfere with identification of the side chains, or the sequencing chemistry


 

Sequence Determination by Mass Spectrometry

Mass spectrometry is a method that separates and quantitates molecules based upon their mass to charge ratio (m/z). It is so accurate that it can assign a mass to a molecule to within 1 Da of accuracy. Therefore, the composition of atoms within the molecule can be accurately identified.

 

The Nature of Amino Acid Sequences

When scientists first began sequencing proteins there were many unanswered questions regarding proteins and the amino acids.

  • For example, there are 20 common amino acids, are they equally represented in protein sequences? (i.e. is each amino acid present present to an equal extent, or 5%?)
  • How similar are homologous proteins from different species? Will we find that related organisms have related amino acid sequences (e.g. are rat and mouse hemoglobin more similar to each other than to human hemoglobin?)
  • Can we use this information to infer the evolutionary relationship between organisms?

 

With regard to amino acids in proteins, it was found that while each amino acid can be found in proteins, some (e.g. alanine) are present in larger amounts, and some are relatively infrequent (e.g. Tryptophan):

Amino Acid

Frequency of Occurrence in Proteins (%)

Ala

9.0

Arg

4.7

Asn

4.4

Asp

5.5

Cys

2.8

Gln

3.9

Glu

6.2

Gly

7.5

His

2.1

Ile

4.6

Leu

7.5

Lys

7.0

Met

1.7

Phe

3.5

Pro

4.6

Ser

7.1

Thr

6.0

Trp

1.1

Tyr

3.5

Val

6.9

 

One surprises for scientists who studied homologous proteins between different species, involved the comparison of human with other great apes, in particular, the chimpanzee.

  • Human and chimp cytochrome C (a protein involved in electron transport) turned out to be identical to each other. In fact, the first few proteins compared between chimp and human proved to be identical - which led some scientists to joke that the differences between human and chimps were merely cultural. However, further characterization identified amino acid sequence differences between the proteins of humans and chimps. Nonetheless, it was clear that many proteins were highly identical when comparing human and chimp species

Protein sequence analysis provided a way to determine the "similarity" of species on a molecular level.

  • "Tree" diagrams could be constructed to reflect molecular similarities
  • Comparisons identified groups of organisms that were very similar to each other, and significantly different from other organisms. These groups, or "nodes", could be diagrammed as branch-points in evolutionary trees.
  • These trees, based on molecular similarity, were very similar to trees constructed by phylogenetic relationships (i.e. morphological characteristics). Thus, morphological differences have as their basis, sequence differences in proteins. Mechanisms that lead to amino acid mutations in proteins can therefore result in morphological differences.
  • Shared genetic diseases
  • . Humans, chimps, gorillas, orangutans, bonobos (the "great apes") share certain genetic diseases. For example, all of the great apes have the same defect in a gene for an enzyme necessary to make vitamin C. Thus, all need to get vitamin C from plants in their diet. Monkeys ("lesser apes") don't have this disease. Thus, this mutation occurred in a common ancestor of the great apes after diverging from monkeys, but, prior to diverging into the different species of great apes.

Another side of sequence similarity is the following: Proteins with similar functionalities often have similar tertiary structures, and therefore, similar amino acid sequences

  • Oxygen-binding proteins that contain prosthetic iron groups all have similar overall tertiary structures, and presumably evolved from some ancient iron-binding protein

 

Yet another surprise was related to the utility with which nature can produce a variety of functional proteins using a relatively small "toolbox" of tertiary structures.

  • Although a given organism (e.g. bacteria or human) produces around 30,000 different proteins, there are only ten fundamental "superfolds" or tertiary structure categories that have been identified. All known protein structures are variations of these basic structures, or combinations of these structures.
  • Rather than "reinvent the wheel", nature appears to achieve new functionalities by mutations introduced into existing protein structures.

Send to Friend
Bookmark