DNA is a long polymer of deoxyribonucleotides.


The length of DNA is usually defined as number of nucleotides (or a pair of nucleotide referred to as base pairs) present in it.


For example, a bacteriophage known as φ ×174 has 5386 nucleotides, Bacteriophage lambda has 48502 base pairs (bp), Escherichia coli has 4.6 × 106 bp, and haploid content of human DNA is 3.3 × 109 bp.


History of DNA


DNA is an acidic substance in the nucleus.


It was first identified by Friedrich Meischer in 1869. He named it as



In 1953 double helix structure of DNA was given by James Watson and Francis Crick, based on X-ray diffraction data produced Maurice Wilkins and Rosalind Franklin.


Salient features of DNA


It is made of two polynucleotide chains, where the backbone is constituted by sugar-phosphate, and the bases project inside.


The two chains have anti-parallel polarity. It means, if one chain has the polarity 5'à3', the other has 3'à5' .


The bases in two strands are paired through hydrogen bond (H-bonds) forming base pairs (bp). Adenine forms two hydrogen bonds with Thymine from opposite strand and vice-versa.


Similarly, Guanine is bonded with Cytosine with three H-bonds. As a result, always a purine comes opposite to a pyrimidine. This generates approximately uniform distance between the two


strands of the helix.


The two chains are coiled in a right-handed fashion. The pitch of the helix is 3.4 nm (a nanometre is one billionth of a metre, that is 10-9 m) and there are roughly 10 bp in each turn. Consequently, the distance


between a bp in a helix is approximately equal to 0.34 nm.


The plane of one base pair stacks over the other in double helix. This, in addition to H-bonds, confers stability of the helical structure


Structure of polynucleotide chain:


A nucleotide has three components-


nitrogen base, pentose sugar (ribose in RNA and deoxyribose in DNA), phosphoric acid.


Two types of nitrogen bases:



Purines (Adenine and Guanine)


Pyrimidines (Cytosine, Uracil and Thymine)


Adenine, Guanine and Cytosine are common in RNA and DNA.


Uracil is present in RNA and in DNA in place of Uracil, Thymine is present.


In RNA, Pentose sugar is ribose and in DNA, it is Deoxyribose.


Based on the nature of pentose sugar, two types of nucleosides are formed - ribonucleoside and deoxyribonucleotides.


Two nucleotides are joined by 3’-5’ Phosphodiester linkage to form dinucleotide.


More than two nucleotides join to form polynucleotide chain.


The two strands of DNA (called DNA duplex) are antiparallel and complementary, i.e, one in 5’->3’ direction and the other in 3”->5” direction.


Packaging of DNA Helix


The basic unit into which DNA is packed in the chromatin of eukaryotes. Nucleosome is the basic repeating structural (and functional) unit of


chromatin, which contains nine histone proteins.


Distance between two conjugative base pairs is 0.34nm


The length of the DNA in a typical mammalian cell will be 6.6 X109 bp X 0.34 X10-9 /bp, it comes about 2.2 meters.


The length of DNA is more than the dimension of a typical nucleus (10-6m)


In prokaryotes, such as, E. coli, though they do not have a defined nucleus, the DNA is not scattered


throughout the cell.DNA (being negatively charged) is held with some proteins (that have positive charges) in a region termed as ‘nucleoid’. The DNA in nucleoid is organised in large loops held by proteins.


In eukaryotes, there is a set of positively charged, basic proteins called histones. A protein acquires charge


depending upon the abundance of amino acids residues with charged side chains.


Histones are rich in the basic amino acid residues lysines and arginines. Both the amino acid residues carry


positive charges in their side chains.


Histones are organised to form a unit of eight molecules called as histone octamer. The negatively charged DNA is wrapped around the positively charged histone octamer to form a structure called Nucleosome.


A nucleosome contains 200 bp of DNA helix.


Nucleosomes constitute the repeating unit of a structure in nucleus called chromatin, thread-like stained (coloured) bodies seen in nucleus.



The nucleosomes in chromatin are seen as ‘beads-on-string’ structure when viewed under electron microscope


The beads-on-string structure in chromatin is packaged to form chromatin fibers that are further coiled and condensed at metaphase stage f cell division to form chromosomes.


The packaging of chromatin at higher level requires additional set of proteins that collectively are referred to as Non-histone Chromosomal (NHC) proteins.


In a typical nucleus, some region of chromatin are loosely packed (and stains light) and are referred to as euchromatin. The chromatin that is more densely packed and stains dark are called as Heterochromatin. Euchromatin is said to be transcriptionally active chromatin, whereas heterochromatin is inactive


Central Dogma :


Proposed by Francis Crick










Transforming Principle:


Frederick Griffith 1928


Streptococcus pneumoniae (pneumococcus) bacteria grown on a culture plate, some produce:


a)smooth shiny colonies (S) because these bacteria have a mucous (polysaccharide) coat


b)rough colonies (R)


Experiment observation:


  • Mice infected with the S strain (virulent) die from pneumonia infection
    • mice infected with the R strain do not develop pneumonia.


Griffith was able to kill bacteria by heating them. He observed that heat-killed S strain bacteria injected into mice did not kill them.


  • When he injected a mixture of heat-killed S and live R bacteria, the mice died.


Moreover, he recovered living S bacteria from the dead mice. Conclusion:


R strain bacteria had somehow been transformed by the heat-killed S strain bacteria. Some ‘transforming



principle’, transferred from the heat-killed S strain, had enabled the R strain to synthesise a smooth polysaccharide coat and become virulent. This must be due to the transfer of the genetic material.


Biochemical Characterisation of Transforming Principle


the genetic material was thought to be a protein


Oswald Avery, Colin MacLeod and Maclyn McCarty (1933-44), worked to determine the biochemical nature of ‘transforming principle’ in




(2)purified biochemicals (proteins, DNA, RNA, etc.) from the heat-killed S cells to see which ones could transform live R cells into S cells.


(3)discovered that DNA alone from S bacteria caused R bacteria to become transformed.


(4)They also discovered that protein-digesting enzymes (proteases) and RNA-digesting enzymes (RNases) did not affect transformation,


(5)Hence the transforming substance was not a protein or RNA. Digestion with DNase did inhibit transformation, suggesting that the DNA caused the transformation.


The Genetic Material is DNA


(1)Alfred Hershey and Martha Chase (1952) proved that the DNA is the genetic material.


(2)worked with viruses that infect bacteria called bacteriophages. (3)Experimental setup:


a-The bacteriophage attaches to the bacteria and its genetic material then enters the bacterial cell.


b-The bacterial cell treats the viral genetic material as if it was its own and subsequently manufactures more virus particles.


c-They grew some viruses on a medium that contained radioactive phosphorus and some others on medium that contained radioactive sulfur.


d- Viruses grown in the presence of radioactive phosphorus contained radioactive DNA but not radioactive protein because DNA contains phosphorus but protein does not.


e- Similarly, viruses grown on radioactive sulfur contained radioactive protein but not radioactive DNA because


DNA does not contain sulfur.


f- Radioactive phages were allowed to attach to E. coli bacteria.


g-as the infection proceeded, the viral coats were removed from the bacteria by agitating them in a blender.


h-The virus particles were separated from the bacteria by spinning them in a centrifuge.





a-Bacteria which was infected with viruses that had radioactive DNA were radioactive, indicating that DNA was the material that passed from the virus to the bacteria.


b-Bacteria that were infected with viruses that had radioactive proteins were not radioactive.




This indicates that proteins did not enter the bacteria from the viruses. DNA is therefore the genetic


material that is passed from virus to bacteria.


Features of Genetic Material


A molecule that can act as a genetic material must fulfill the following criteria:


It should be able to generate its replica (Replication). It should chemically and structurally be stable.


It should provide the scope for slow changes (mutation) that are required for evolution.


It should be able to express itself in the form of 'Mendelian Characters’


Stability of DNA vs RNA


In DNA the two strands being complementary if separated by heating come together,


when appropriate conditions are provided.


The2'-OH group present at every nucleotide in RNA is a reactive group and makes RNA labile and


easily degradable.


RNA is known to be catalytic, hence reactive.


the presence of thymine at the place of uracil also confers additional stability to DNA.


DNA has evolved from RNA with chemical modifications that makes it more stable


Hence DNA chemically is less reactive and structurally more stable when compared to RNA. Therefore, among the two nucleic acids, the DNA is a better genetic material.




is the first genetic material.


is a non hereditary nucleic acid except in some viruses (retroviruses).


It is a polymer of ribonucleotide and is made up of pentose ribose sugar, phosphoric acid and nitrogenous base (A,U,G,C).



every nucleotide residue has an additional –OH group present at 2' - position in the ribose. Also, in RNA the uracil is found at the place of thymine (5-methyl uracil, another chemical name for thymine).


Mutate at a faster rate than DNA as it is unstable.


viruses having RNA genome and having shorter life span mutate and evolve faster.


RNA can directly code for the synthesis of proteins, hence can easily express the characters. DNA, however, is dependent on RNA for synthesis of proteins.


DNA Replication


DNA molecule is capable of self duplication.


In eukaryotes, the replication of DNA takes place at S-phase of the cell-cycle.


The replication of DNA and cell division cycle should be highly coordinated. A failure in cell division after


DNA replication results into polyploidy


Watson and Crick had proposed the scheme for the replication of DNA. the two strands would separate and act as a template for the synthesis of new complementary strands.


b-After the completion of replication, each DNA molecule would have one parental and one newly synthesised strand.


c-This scheme was termed as semiconservative DNA replication.




Experimental Proof of semi conservative mode of replication: first observed in Escherichia coli


Matthew Meselson and Franklin Stahl performed the following experiment in 1958


Experimental setup:grew E. coli in a medium containing 15NH4Cl (15N is the heavy isotope of nitrogen) as the only nitrogen source for many generations.


The result was that 15N was incorporated into newly synthesised DNA (as well as other nitrogen containing compounds).


This heavy DNA molecule could be distinguished from the normal DNA by centrifugation in a cesium chloride (CsCl) density gradient


Then they transferred the cells into a medium with normal 14NH4Cl and took samples at various definite time intervals as the cells multiplied, and extracted the DNA that remained as double-stranded helices. The various samples were separated independently on CsCl gradients to measure the densities of DNA.



Thus, the DNA that was extracted from the culture one generation after the transfer from 15N to 14N medium [that is after 20 minutes; E. coli divides in 20 minutes] had a hybrid or intermediate density. DNA extracted from the culture after another generation [that is after 40 minutes, II generation] was composed of equal amounts of this hybrid

DNA and of ‘light’ DNA.


The Machinery and the Enzymes


In living cells, such as E. coli, the process of replication requires a set of catalysts (enzymes).


The main enzyme is referred to as DNA-dependent DNA polymerase, since it uses a DNA template to catalyse the polymerisation of deoxynucleotides.


These enzymes are highly efficient enzymes as they have to catalyse the polymerisation of a large number of


nucleotides in a very short time with accuracy. E. coli that has only 4.6 ×106 bp and it completes the


process of replication within 18 minutes;


Any mistake during replication would result into mutations. energetically replication is a very expensive process.


Deoxyribonucleoside triphosphates serve dual purposes. acts as substrates,


provide energy for polymerisation reaction


For long DNA molecules, since the two strands of DNA cannot be separated in its entire length (due to very high energy requirement), the replication occur within a small opening of the DNA


helix, referred to as replication fork.


The DNA-dependent DNA polymerases catalyse polymerisation only in one direction, that is 5' to 3' .


Consequently, on one strand (the template with polarity 3'to5' ), the replication is continuous, while on the other (the template with polarity 5'to 3' ), it is discontinuous. The discontinuously synthesised


fragments are later joined by the enzyme DNA ligase (Figure 6.8). The DNA polymerases on their own cannot initiate the process of


replication. Also the replication does not initiate randomly at any place in DNA.


There is a definite region in E. coli DNA where the replication originates. Such regions are termed as origin of replication.


It is because of the requirement of the origin of replication that a piece of DNA if needed to be propagated during recombinant DNA procedures, requires a vector. The vectors provide the origin of replication.





The process of copying genetic information from one strand of the DNA into RNA is termed as transcription.


the adenosine forms base pair with uracil instead of thymine.


in transcription only a segment of DNA and only one of the strands is copied into RNA.


Why both the strands are not copied during transcription?


First, if both strands act as a template, they would code for RNA molecule with different sequences and in turn, if they code for proteins, the sequence of amino acids in the proteins would be different. Hence, one segment of the DNA would be coding for two different proteins, and this would complicate the genetic information transfer machinery.


Second, the two RNA molecules if produced simultaneously would be complementary to each other, hence would form a double stranded RNA. This would prevent RNA from being translated into protein and the exercise of transcription would become a futile one.


Transcription Unit


A transcription unit in DNA is defined primarily by the three regions in the DNA:


  • A Promoter


  • The Structural gene


  • A Terminator


the two strands have opposite polarity


the strand that has the polarity 3'→5' acts as a template, and is also referred to as template strand.


The other strand which has the polarity (5' to 3') and the sequence same


as RNA (except thymine at the place of uracil), is displaced during transcription. Strangely, this strand (which does not code for anything) is referred to as coding strand.


All the reference point while defining a transcription unit is made with coding strand.


The promoter and terminator flank the structural gene in a transcription unit.


The promoter is said to be located towards 5'-end (upstream) of the structural gene (the reference is made with respect to the polarity of coding strand).


It is a DNA sequence that provides binding site for RNA polymerase, and it is the presence of a promoter in a


transcription unit that also defines the template and coding strands.



The terminator is located towards 3'-end (downstream) of the coding strand and it usually defines the end of the process of transcription.


Transcription Unit and the Gene


A gene is defined as the functional unit of inheritance


Cistron: as a segment of DNA coding for a polypeptide, the structural gene in a transcription unit could be said as monocistronic (mostly in eukaryotes) or polycistronic (mostly in bacteria or prokaryotes).


In eukaryotes, the monocistronic structural genes have interrupted coding sequences – the genes in eukaryotes are split hence called split gene arrangement.


The coding sequences or expressed sequences are defined as exons. Exons are said to be those sequence that appear in mature or processed RNA.


The exons are interrupted by introns.


Introns or intervening sequences do not appear in mature or processed RNA. Inheritance of a character is also affected by promoter and regulatory sequences of a structural gene. Hence, sometime the regulatory sequences are loosely defined as regulatory genes, even though these sequences do not code for any RNA or protein.




Types of RNA and the process of Transcription


In bacteria, there are three major types of RNAs: mRNA (messenger RNA),


tRNA (transfer RNA), and rRNA (ribosomal RNA


All three RNAs are needed to synthesise a protein in a cell.


The mRNA provides the template,tRNA brings aminoacids and reads the genetic code, and rRNAs play structural and catalytic role during translation.


There is single DNA-dependent RNA polymerase that catalyses transcription of all types of RNA in bacteria.


RNA polymerase binds to promoter and initiates transcription (Initiation). It uses nucleoside triphosphates as substrate and polymerises in a


template depended fashion following the rule of complementarity It somehow also facilitates opening of the helix and continues



Only a short stretch of RNA remains bound to the enzyme.


Once the polymerases reaches the terminator region, the nascent RNA falls off, This results in termination of transcription.


The RNA polymerase is only capable of catalysing the



process of elongation. It associates transiently with initiation-factor and termination-factor to initiate and terminate the transcription, respectively.



Association with these factors alter the specificity of the RNA polymerase to either initiate or terminate


In bacteria, since the mRNA does not require any processing to become active, and also since transcription and translation take place in the same compartment (there is no separation of cytosol and nucleus in bacteria), many times the translation can begin much before the mRNA is fully transcribed. Consequently, the transcription and translation can be coupled in bacteria.


In eukaryotes, there are two additional complexities –



There are at least three RNA polymerases in the nucleus (in addition to the RNA polymerase found in the organelles).


The RNA polymerase II transcribes precursor of m RNA heterogeneous nuclear RNA (hnRNA).


The second complexity is that the primary transcripts contain both the exons and the introns and are non-functional.


Hence, it issubjected to a process called splicing where the introns are removed and exons are joined in a defined order.


hnRNA undergoes additional processing called as capping and tailing.


In capping an unusual nucleotide (methyl guanosine triphosphate) is added to the 5'-end of hnRNA.


In tailing, adenylate residues (200-300) are added at 3'-end in a template independent manner. It is the fully processed hnRNA, now called mRNA, that is transported out of the nucleus for translation).




During replication and transcription a nucleic acid was copied to form another nucleic acid.


The process of translation requires transfer of genetic information from a polymer of nucleotides to a polymer of amino acids.


There existed ample evidences, though, to support the notion that change in nucleic acids (genetic material) were responsible for change in amino acids


It was George Gamow, a physicist, who argued that since there are only 4 bases and if they have to code for 20 amino acids, the code should constitute a combination of bases.


He suggested that in order to code for all the 20 amino acids, the code should be made up of three nucleotides.



This was a very bold proposition, because a permutation combination of 43 (4 × 4 × 4) would generate 64 codons; generating many more codons than required.


The chemical method developed by Har Gobind Khorana was instrumental in synthesising RNA molecules with defined combinations of bases (homopolymers and copolymers).


Marshall Nirenberg’s cell-free system for protein synthesis finally helped the code to be deciphered.


Severo Ochoa enzyme (polynucleotide phosphorylase) was also helpful in polymerising RNA with defined sequences in a template independent manner (enzymatic synthesis of RNA).




The salient features of genetic code are as follows:



The codon is triplet. 61 codons code for amino acids and 3 codons do not code for any amino acids, hence they function as stop codons.


  • One codon codes for only one amino acid, hence, it is unambiguous and specific.


  • Some amino acids are coded by more than one codon, hence the code is degenerate.


  • The codon is read in mRNA in a contiguous fashion. There are no punctuations.


  • The code is nearly universal: for example, from bacteria to human UUU would code for Phenylalanine (phe). Some exceptions to this


rule have been found in mitochondrial codons, and in some protozoans.


  • AUG has dual functions. It codes for Methionine (met) , and italso act as initiator codon.

































Mutations and Genetic Code


A classical example of point mutation is a change of single base pair in


the gene for beta globin chain that results in the change of amino acid residue glutamate to valine.


It results into a diseased condition called as sickle cell anemia


. Insertion or deletion of one or two bases changes the reading frame from the point of insertion or deletion


However, such mutations are referred to asframeshift insertion or deletion mutations. Insertion or deletion of three or its multiple bases insert or delete one or multiple codon hence one or multiple amino acids, and reading frame remains unaltered from that point onwards.


tRNA– the Adapter Molecule



it was clear to Francis Crick that there has to be a mechanism to read the code and also to link it to the amino acids, because amino acids have no structural specialities to read the code uniquely.


He postulated the presence of an adapter molecule that would on one hand read the code and on other hand would bind to specific amino acids.


The tRNA, then called sRNA (soluble RNA), was known before the genetic code was postulated



However, its role as an adapter molecule was assigned much later. tRNA has an anticodon loop that has bases complementary to the code, and it also has an amino acid acceptor end to which it binds to aminoacids. tRNAs are specific for each amino acid


For initiation, there is another specific tRNA that is referred to as initiator tRNA. There are no tRNAs for stop codons


the tRNA is a compact molecule which looks like inverted L.




Translation refers to the process of polymerisation of amino acids to form a polypeptide


The order and sequence of amino acids are defined by the sequence of bases in the mRNA.


The amino acids are joined by a bond which is known as a peptide bond. Formation of a peptide bond requires energy. Therefore, in the first phase itself aminoacids are activated in the presence of ATP and linked to their


cognate tRNA–a process commonly called as charging of tRNA or aminoacylation of tRNA to be more specific.



The cellular factory responsible for synthesising proteins is the ribosome. The ribosome consists of structural RNAs and about 80 different proteins. In its inactive state, it exists as two subunits; a large subunit and a small


Subunit When the small subunit encounters an mRNA, the process of translation of the mRNA to protein begins.


There are two sites in the large subunit, for subsequent amino acids to bind to and thus, be close enough to each other for the formation of a peptide bond


The ribosome also acts as a catalyst (23S rRNA in bacteria is the enzyme-ribozyme) for the formation of peptide bond.


A translational unit in mRNA is the sequence of RNA that is flanked by the start codon (AUG) and the stop codon and codes for a polypeptide.


An mRNA also has some additional sequences that are not translated and are referred as untranslated regions (UTR).


The UTRs are present at both 5' -end (before start codon) and at 3' -end (after stop codon). They are required for efficient translation process. dictated by DNA and represented by mRNA.



At the end, a release factor binds to the stop codon, terminating translation and releasing the complete polypeptide from the ribosome.


For initiation, the ribosome binds to the mRNA at the start codon (AUG) that is recognised only by the initiator tRNA.



The ribosome proceeds to the elongation phase of protein synthesis. During this stage, complexes composed of an amino acid linked to tRNA, sequentially bind to the appropriate codon in mRNA by forming complementary base pairs with the tRNA anticodon


. The ribosome moves from codon to codon along the mRNA. Amino acids are added one by one, translated into Polypeptide sequences




Regulation of gene expression refers to a very broad term that may occur at various levels. Considering that gene expression results in the


formation of a polypeptide, it can be regulated at several levels. In eukaryotes, the regulation could be exerted at


  • transcriptional level (formation of primary transcript),
  • processing level (regulation of splicing),
  • transport of mRNA from nucleus to the cytoplasm,
  • translational level.


The genes in a cell are expressed to perform a particular function or a


  • set of functions. For example, if an enzyme called beta-galactosidase is synthesised by coli,


  • it is used to catalyse the hydrolysis of a disaccharide, lactose into galactose and glucose; the bacteria use them as a source of energy.


Hence, if the bacteria do not have lactose around them to be utilised for energy source, they would no longer require the synthesis of the enzyme beta-galactosidase


Therefore, in simple terms, it is the metabolic, physiological or environmental conditions that regulate the expression of genes.


In prokaryotes, control of the rate of transcriptional initiation is the predominant site for control of gene expression.



In a transcription unit, the activity of RNA polymerase at a given promoter is in turn regulated by interaction with accessory proteins, which affect its ability to recognise start sites.


These regulatory proteins can act both positively (activators) and negatively (repressors).


The accessibility of promoter regions of prokaryotic DNA is in many cases regulated by the interaction of proteins with sequences termed operators.


The operator region is adjacent to the promoter elements in most operons and in most cases the sequences of the operator bind a repressor protein.



Each operon has its specific operator and specific repressor. For example, lac operator is present only in the lac operon and it interacts specifically with lac repressor only




The Lac operon



The elucidation of the lac operon was also a result of a close association between a geneticist, Francois Jacob and a biochemist, Jacque Monod. Theywere the first to elucidate a transcriptionally regulated system.


In lac operon:


  • (here lac refers to lactose), a polycistronic structural gene is regulated by a


  • common promoter and regulatory genes. Such arrangement is very common
    • in bacteria and is referred to as operon.


To name few such examples, lac operon, trp operon, ara operon, his operon, val operon, etc.


The lac operon consists of one regulatory gene (the i gene – here the


  • term i does not refer to inducer, rather it is derived from the word inhibitor)


  • and three structural genes (z, y, and a). The i gene codes for the repressor of the lac

The z gene codes for beta-galactosidase (  -gal), which


  • is primarily responsible for the hydrolysis of the disaccharide, lactose into its monomeric units, galactose and glucose.


The y gene codes for permease, which increases permeability of the cell to beta-galactosides


The a gene encodes a transacetylase. Hence, all the three gene products in lac operon are required for metabolism of lactose


Lactose is the substrate for the enzyme beta-galactosidase and it regulates switching on and off of the operon. Hence, it is termed as inducer.

In the absence of a preferred carbon source such as glucose, if


lactose is provided in the growth medium of the bacteria, the lactose is transported into the cells through the action of permease (Remember, a very low level of expression of lacoperon has to be present in the cell all the time, otherwise lactose cannot enter the cells).


The lactose then induces the operon in the following manner.


The repressor of the operon is synthesised (all-the-time – constitutively) from the i gene.



The repressor protein binds to the operator region of the operon and prevents RNA polymerase from transcribing the operon.


In the presence of an inducer, such as lactose or allolactose, the repressor is inactivated by interaction with the inducer





A very ambitious project of sequencing human genome was launched in the year 1990.


Human Genome Project (HGP) was called a mega project:


Human genome is said to have approximately 3 x 109 bp, and if the


  • cost of sequencing required is US $ 3 per bp (the estimated cost in the beginning), the total estimated cost of the project would be approximately 9 billion US dollars


Further, if the obtained sequences were to be stored in typed form in books, and if each page of the book contained 1000letters and each book contained 1000 pages, then 3300 such books wouldbe required to store the information of DNA sequence from a single humancell.


The enormous amount of data expected to be generated also


  • necessitated the use of high speed computational devices for data storage


  • and retrieval, and analysis. HGP was closely associated with the rapid
    • development of a new area in biology called Bioinformatics.




Goals of HGP


Some of the important goals of HGP were as follows:


  • Identify all the approximately 20,000-25,000 genes in human DNA;


o   Determine the sequences of the 3 billion chemical base pairs that make up human DNA;


o Store this information in databases; o Improve tools for data analysis;

o Transfer related technologies to other sectors, such as industries;


  • Address the ethical, legal, and social issues (ELSI) that may arisefrom the project.



The Human Genome Project was a 13-year project coordinated by the U.S. Department of Energy and the National Institute of Health.



During the early years of the HGP, the Wellcome Trust (U.K.) became a major partner; additional contributions came from Japan, France, Germany,China and others.


The project was completed in 2003. Knowledge about the effects of DNA variations among individuals can lead to revolutionary new ways to diagnose, treat and someday prevent the thousands ofdisorders that affect human beings. Besides providing clues to understanding human biology, learning about non-human organisms


o DNA sequences can lead to an understanding of their natural capabilities


o that can be applied toward solving challenges in health care, agriculture,

o energy production, environmental remediation.


Many non-human model organisms, such as bacteria, yeast, Caenorhabditis elegans (a free living non-pathogenic nematode), Drosophila (the fruit fly), plants (rice andArabidopsis), etc., have also been sequenced.






The methods involved two major approaches. Onebapproach focused on identifying all the genes that are expressed as RNA (referred to as Expressed Sequence Tags (ESTs).


The other took the blind approach of simply sequencing the whole set of genome that


o contained all the coding and non-coding sequence, and later assigning different regions in the sequence with functions (a term referred to as Sequence Annotation)


For sequencing, the total DNA from a cell is isolated and converted into random fragments of relatively smaller sizes(recall DNA is a very long polymer, and there are technical limitations in sequencing very long pieces of DNA) and cloned in suitable host using specialised vectors.


The cloning resulted into amplification of each piece of DNA fragment so that it subsequently could be sequenced with ease.


The commonly used hosts were bacteria and yeast, and the vectors were o called as BAC (bacterial artificial chromosomes), and YAC (yeast

artificial chromosomes).


The fragments were sequenced using automated DNA sequencers that worked on the principle of a method developed by Frederick Sanger. (Remember, Sanger is also credited for developing method for determination of amino acidsequences in proteins)



These sequences were then arranged based on some overlapping regions present in them.


The sequence of chromosome 1 was completed only in May 2006 (this was the last of the 24 human chromosomes – 22 autosomes and X and Y

– to be sequencd.


Salient Features of Human Genome



Some of the salient observations drawn from human genome project are as follows:


The human genome contains 3164.7 million nucleotide bases.



The average gene consists of 3000 bases, but sizes vary greatly, with the largest known human gene being dystrophin at 2.4 million bases. The total number of genes is estimated at 30,000–much lower

than previous estimates of 80,000 to 1,40,000 genes. Almost all


(99.9 per cent) nucleotide bases are exactly the same in all people. The functions are unknown for over 50 per cent of the discovered



  • Less than 2 per cent of the genome codes for proteins.


  • Repeated sequences make up very large portion of the human genome.


  • Repetitive sequences are stretches of DNA sequences that are


repeated many times, sometimes hundred to thousand times. They are thought to have no direct coding functions, but they shed light on chromosome structure, dynamics and evolution.


Chromosome 1 has most genes (2968), and the Y has the fewest (231)


  • Scientists have identified about 1.4 million locations where singlebase


DNA differences (SNPs – single nucleotide polymorphism, pronounced as ‘snips’) occur in humans.


Applications and Future Challenges


Deriving meaningful knowledge from the DNA sequences will define


  • research through the coming decades leading to our understanding of biological systems.



This enormous task will require the expertise and creativity of tens of thousands of scientists from varied disciplines in both the public and private sectors worldwide


One of the greatest impacts of having the HG sequence may well be enabling a radically new approach to biological research. In the past, researchers studied one or a few genes at a time.


With whole-genome sequences and new high-throughput technologies, we can approach questions systematically and on a much broader scale




99.9 per cent of base sequence among humans is the same. Assuming human genome as 3 × 109 bp, in how many base sequences would there be differences? It is these differences in sequence of DNA which make every individual unique in their phenotypic appearance.


DNA fingerprinting involves identifying differences in some specific regions in DNA sequence called as repetitive DNA, because in these sequences, a small stretch of DNA is repeated many times.


The bulk DNA forms a major peak and the other small peaks are referred to as satellite DNA. Depending on base composition (A : T rich or G:C rich), length of segment, and number of repetitive units, the satellite DNA is classified into many categories,such as micro-satellites, mini-satellites etc.


These sequences normally do not code for any proteins, but they form a large portion of human genome. These sequence show high degree of polymorphism and form the basis of DNA fingerprinting.


Since DNA from every tissue (such as blood, hair-follicle, skin, bone, saliva, sperm etc.), from an individual show the same degree of polymorphism, they become very useful identification tool in forensic applications.


Further, as the polymorphisms are inheritable from parents to children, DNA fingerprinting is the basis of paternity testing, in case of disputes Polymorphism (variation at genetic level) arises due to mutations.

sequence variation has traditionally been described as a DNA


polymorphism if more than one variant (allele) at a locus occurs in human population with a frequency greater than 0.01. In simple terms, if an inheritable mutation is observed in a population at high frequency, it is referred to as DNA polymorphism. The probability of such variation to be observed in noncoding


o DNA sequence would be higher as mutations in these sequences


  • may not have any immediate effect/impact in an individual’s reproductive ability.


These mutations keep on accumulating generation



after generation, and form one of the basis of variability/polymorphism. The technique of DNA Fingerprinting was initially developed by Alec Jeffreys. He used a satellite DNA as probe that shows very high degree of polymorphism


It was called as Variable Number of Tandem Repeats


  • (VNTR). The technique, as used earlier, involved Southern blot


  • hybridisation using radiolabelled VNTR as a probe. It included


isolation of DNA,


digestion of DNA by restriction endonucleases, separation of DNA fragments by electrophoresis,


transferring (blotting) of separated DNA fragments to synthetic


  • membranes, such as nitrocellulose or nylon, hybridisation using labelled VNTR probe, and


detection of hybridised DNA fragments by autoradiography


The VNTR belongs to a class of satellite DNA referred to as mini-satellite.A small DNA sequence is arranged tandemly in many copy numbers.



The copy number varies from chromosome to chromosome in an individual.


The numbers of repeat show very high degree of polymorphism. As a result the size of VNTR varies in size from 0.1 to 20 kb. Consequently, after hybridisation with VNTR probe, the autoradiogram gives many bands of differing sizes