As everyone who has studied biology in the last 50 years must know, proteins are made from mRNA which is made from DNA, and this is performed by a simple coding mechanism; a three base segment of DNA, called a codon, is translated into a particular amino acid. Since there are 4 bases there are 4 x 4 x 4 = 64 possible codons. 3 of these codons in humans code for "stop", the end of a polypeptide chain. The other 61 code each for particular amino acids. Since there are only 20 genetically encoded amino acids, most amino acids are coded for by multiple codons. Bacteria and mammals prefer to use different codons, so that mammalian genes frequently use codons which are rarely employed in bacteria and vice versa. So it can happen that a mammalian gene may be expressed very poorly in bacteria, a significant problem. A way to deal with this is to look at the mammalian sequence, figure out which codons are optimal for bacterial expression, and synthesize an appropropriate DNA sequence specifically to efficiently express the mammalian gene in bacteria. To do this you find out which codons are the most widely used in the species of interest, and synthesize a DNA sequence made up of these. However, figuring this out is time consuming to do by hand. An alternative is to set up software on your computer to do this, some of which is available for free on the internet. Or you could just use our simple on-line program which will not clog up your hard disk, since it runs in your web browser. The program will ignore numbers, spaces or characters like B or Z which do not correspond to one of the amino acids. This program can also deal with FASTA format sequences (see here for info on that); it ignores any line of text which is started by a ">" character, and keeps ignoring that text until it sees a newline character. |