Search This Blog

Wednesday, March 30, 2011

The ABC's of DNA sequencing...(ish)

Want to hear a (anti) joke?   Ok, here goes: "Abby, Bob, and Carol walk into a hospital.......and get their DNA tested for a horrible disease." Because the hospital that they walked into a hospital stuck in the 1980's they are using the Sanger method of sequencing to examine the DNA. To make a complex process rather simple, the Sanger method takes a sample of DNA from a subject, each base (A,T,G, C) is radioactively labeled (for detection later in a sequencing machine) and synthesized (copied). The newly synthesized DNA is then put through gel electrophoresis, which uses electricity and gel to sort the DNA Molecules by size and creates a visual representation on X-Ray fil, or the gel itself. (Figure 1)

(Fig. 1)
So, Abby, Bob, and Carol all had their DNA run through the Sanger process. The next step to get their results was for a "skilled" technician to write out the DNA sequence out. To write out the sequence you start at the bottom and write down the the letter that each mark represents. (the above sequence would be CGA GAT ATA etc...) Each letter represents a base, and he three letter sorting is important later on to determine the proteins that the sequence represents.

Abby's Sequence went:  ATG GTG CAC CTG ACT CCT GTG GAG AAG TCT GCC

Bob's Sequence went:    ATG GTG CAC CTG ACT CCT GTG GAG TAG TCT GCC

Carol's Sequence went:  ATG GTG CAC CTG ACC CTG AGG AGA AGT CTG CCC

In order to determine if they had the "life ending disease" their DNA sequences were compared to that a control subject (Norm) who did not have the disease.

Norm's Sequence was:   ATG GTG CAC CTG ACT CCT GAG GAG AAG TCT GCC

To determine the likely hood of having the disease the hospital needs to create a quantitative representation of what all those letters mean, so the create a number called the percent similarity. To do this the put the number of bases that are simular to "Norms" over the total number of bases in the sequence. I took the liberty of putting the results into a pretty little graph to show how each of the patients compares to the control (Norm.).



As you can very clearly see, Bob and Abby have a very simular (96.45%) DNA sequence to the control, which would lead me to assume that they would not have the horrible disease. Carol on the other hand was not that simular to Norm, therefore I would assume that she has the disease. 

One Last Note: (for content stuffs) 
The reason that DNA is sequenced in chunks of three (called codons) because the proteins that are formed based on the DNA are created from the three letter sequences. Each set of three letters, in other words, represents a protein. To determine what proteins are made by what sequence, one has to look at a magical "decoder table." The nice thing about proteins and DNA sequences, is that the same protein can usually be made a number of different sequences, so long as the first and second letters of the sequence are the same. For example, the sequences CCU CCC CCA and CCG all make "pro" (Proline.) Which means that if a sequence is off, the proteins won't always be off. Which is the important part...

No comments:

Post a Comment