Calculate the molecular weight of a protein sequence:
It is often important to know the molecular weight and approximate charge of a
peptide or protein sequence. This program will take a protein sequence, eitehr typed in directly or pasted in from database, wordfile or whatever, and calculate the molecular weight
and approximate charge. The program will ignore numbers, spaces, punctuation or characters like B or Z in the input which do not correspond to one of the 20 genetically encoded amino acids, as displayed using the single letter amino acid code. This version can also deal with FASTA format
sequences (see here for info on that); in other words it ignores any line of text which is started by a ">" character. The program can also process multiple FASTA format entries, only counting the sequence data and ignoring the stuff following the ">". It was actually quite interesting to research how to make this program. The atomic weights for each atom used are from the International Union of Pure and Applied Chemistry (IUPAC) web site here. This data gives average atomic weights based on average isotopic content on the planet earth, which can vary by a very small amount. For instance Carbon from fossil fuels contains no C14, while Carbon from the atmosphere does, so the average carbon atom in the atmosphere is slightly heavier. The values, in Daltons, are H=1.00794, C=12.0107, N=14.0067, O=15.9994 and S=32.065. For phosphoproteins P=30.973761. You'll find that the results you'll get from this program are more or less but not quite the same as what you will get from other programs; If you just want the molecular weight to get an idea of where to look for the protein in an SDS-PAGE gel, any of these programs will be plenty accurate enough. However in the days of mass spectroscopy it can be quite important to get very accurate results. The small differences you get from different programs are due to several factors. One is that other programs may use slightly different values for the average atomic weight of each atom, using older IUPAC numbers or simply rounding off the atomic weights. More important, the molecular weight of a protein is a function of the ionization state; at neutral pH, aspartic and glutamic acid have both lost the vast majority of their hydrogens, so proteins containing these amino acids loose most of 1.00794 Daltons per glutamic and aspartic. Similarly at neutral pH, the vast majority of lysine and arginine residues have an additional hydrogen, they are protonated, adding 1.00794 for each of these two amino acids. Finally histidine becomes protonated at neutral pH, but only to a limited extent, roughly 50% at pH=7.0. In this program we therefore counted 7.5 hydrogens per Histidine residue, halfway between the 7 in deprotonated histidine and the 8 of the protonated form. Of course the differences due to protonation and deprotonation are relatively small, and for many proteins at neutral pH the level of protonation of lysine, arginine and histidine is more or less balanced by the level of deprotonation of glutamic and aspartic acids. Postranslational modifications such as phosphorylation and glycosylation have a much bigger impact on protein molecular weight, but to account for that you need a more complex program thna this one. At least we are telling you exactly how our program works...
|