Molecular Biology


1.      Biological Background. 2

1.1.       DNA  - Deoxyribonucleic Acid. 2

1.2.       RNA.. 2

1.3.       Proteins 3

1.4.       The Central Dogma Of Molecular Biology. 3

1.5.       Protein structure. 4

1.6.       Multiple Sequence Alignment 5 6


1.     Biological Background

1.1.            DNA  - Deoxyribonucleic Acid


In humans, as in other higher organisms, a DNA molecule consists of two strands that wrap around each other to resemble a twisted ladder whose sides, made of sugar and phosphate molecules, are connected by rungs of nitrogen containing chemicals called bases. Four different bases are present in DNA: adenine (A), thymine (T), cytosine (C), and guanine (G). The particular order of the bases arranged along the sugar- phosphate backbone is called the DNA sequence; the sequence specifies the exact genetic instructions required to create a particular organism with its own unique traits. The two DNA strands are held together by weak bonds between the bases on each strand, forming base pairs (bp). Genome size is usually stated as the total number of base pairs; the human genome contains roughly 3 billion bp. A gene is a segment of a DNA molecule (ranging from fewer than 1 thousand bases to several million), located in a particular position on a specific chromosome, whose base sequence contains the information necessary for protein synthesis.


1.2.            RNA

RNA has the same structure as DNA. The primary differences between RNA and DNA are:

RNA has a hydroxyl group on the second carbon of the sugar and instead of using nucleotide thymine, RNA uses another nucleotide called uracil (U). Since RNA has extra hydroxyl group on it's sugar strand, RNA is too bulky to form a stable double helix therefore it exists as a single-stranded molecule. In addition to that, because the RNA molecule is not restricted to a rigid double helix, it can form many different structures. There are several different kinds of RNA made by the cell. They are mRNA, tRNA, rRNA and snRNA.

1.3.            Proteins

Proteins are involved in almost all biological activities, structural or enzymatic. A protein is made by arranging amino acids together in a specific sequence (the sequence of every protein is different).  These amino acids are held together by a special bond called a peptide bond. There are altogether 20 different amino acids.


1.4.            The Central Dogma Of Molecular Biology

How does the sequence of a strand of DNA correspond to the amino acid sequence of a protein? This concept is explained by the central dogma of molecular biology, according to which


1.5.            Protein structure

A striking characteristic of proteins is that they have very well defined 3-D structures. A stretched-out polypeptide chain has no biological activity, and protein function arises from the conformation of the protein, which is the 3-D arrangement or shape of the molecules in the protein. The native conformation of a protein is determined by a number of factors, and the most important are the 4 levels of structure found in proteins. Primary, secondary and tertiary refer to the molecules in a single polypeptide chain, and the fourth (quaternary) refers to the interaction of several polypeptide chains to form a multi-chained protein. In this paper, we limit our discussion to just the primary and secondary structure.


Primary Structure


The primary structure of a protein is determined by the number and order of amino acids within a polypeptide chain.  A polypeptide is a sequence of two or more amino acids joined together by peptide bonds. Determination of primary structure is an essential step in the characterization of a protein.

Secondary Structure

Protein secondary structure refers to regular, repeated patters of folding of the protein backbone. The two most common folding patterns are the alpha helix and the beta sheet. Patterns result from regular hydrogen bond patterns of backbone atoms.






In the alpha helix, the polypeptide folds by twisting into a right handed screw so that all the amino acids can form hydrogen bonds with each other. This high amount of hydrogen bonding stabilizes the structure so that it forms a very strong rod-like structure.


The beta-pleated sheet is substantially different from the alpha-helix in that it is a sheet rather than a rod and polypeptide chain is fully stretched rather than tightly coiled as in helix. It is called a beta-pleated sheet because of zig zag appearance when viewed from the side.


The tertiary structure of a protein is formed when the attractions of side chains and those of the secondary structure combine and cause the amino acid chain to form a distinct and unique 3-dimensional structure.  It is this unique structure that gives a protein its specific function.


1.6.            Multiple Sequence Alignment

Multiple alignment is the process of aligning two or more sequences with each other in order to determine any evolutionary relationships. For aligning two sequences the dynamic programming approach is the most suitable. This approach can be generalized for multiple sequence alignment also. But for a large number of sequences this approach becomes impractical.  There are heuristic methods available to speed up the dynamic programming approach like the local multiple alignment using the Sum of Pairs scoring function. In our treatise, we will show how HMMs can be effective in solving this problem.