logo


Codon Optimization Calculator:

      As everyone who has studied biology in the last 50 years must know, proteins are made from mRNA which is made from DNA, and this is performed by a simple coding mechanism; a three base segment of DNA, called a codon, is translated into a particular amino acid. Since there are 4 bases there are 4 x 4 x 4 = 64 possible codons. 3 of these codons in humans code for "stop", the end of a polypeptide chain. The other 61 code each for particular amino acids. Since there are only 20 genetically encoded amino acids, most amino acids are coded for by multiple codons. Bacteria and mammals prefer to use different codons, so that mammalian genes frequently use codons which are rarely employed in bacteria and vice versa. So it can happen that a mammalian gene may be expressed very poorly in bacteria, a significant problem. A way to deal with this is to look at the mammalian sequence, figure out which codons are optimal for bacterial expression, and synthesize an appropropriate DNA sequence specifically to efficiently express the mammalian gene in bacteria. To do this you find out which codons are the most widely used in the species of interest, and synthesize a DNA sequence made up of these. However, figuring this out is time consuming to do by hand. An alternative is to set up software on your computer to do this, some of which is available for free on the internet. Or you could just use our simple on-line program which will not clog up your hard disk, since it runs in your web browser.

      The program will ignore numbers, spaces or characters like B or Z which do not correspond to one of the amino acids. This program can also deal with FASTA format sequences (see here for info on that); it ignores any line of text which is started by a ">" character, and keeps ignoring that text until it sees a newline character.


Select species to express your protein in:
Bacteria Yeast Insect Mammal

Do you want to add a stop codon to the 3' end?
No Yes

(If you don't select buttons above, default values are bacterial expression and no stop codon).

Paste or type your protein sequence in box below, can be upper or lowercase, program will read either and both.



How This Works: We made use of the codon tables which can downloaded from the excellent Codon Usage Database, maintained by the Department of Plant Gene Research in Kazusa, Japan. This database tabulates codon usage in a stunning variety of species; we extracted the values of Homo Sapiens, Spodoptera frugiperda, Saccharomyces cerevisiae and Escherichia coli strain CFT073. These values were used for mammalian, insect, yeast and bacteria respectively. The mammalian values should be appropriate for expression in HeLa, Hek293, Cos7 and other mammalian cells, the insect for expression in SF9 cells (which were derived from S. frugiperda, a type of moth), the yeast values for expression in S. cerevisiae, and the bacterial values for expression in E. coli. The program does not take into account the possible negative effects of palidromic or other self complementary sequences, which can cause the mRNA transcript to form hairpin structures which may affect expression.


References: This program has been widely used for many years and a partial list of references can be obtained by entering "EnCor codon optimization calculator" into Google Scholar, as here.


Disclaimer: This program was constructed primarily to save time for busy researchers around the world. We hope you find it useful, and we are confident that it is accurate and reliable. However, we cannot be held responsible for any problems which may arise as a result of the use of this program.

©EnCor Biotechnology Inc. . To go to EnCor Homepage Press Here