It’s a huge area, and I’m hoping to break it down into some key areas over the next few blog posts – the first is one of my favourites - the technology.
In any part of science, one picks up lingo particular to the field, it’s almost like a second vocabulary, and when it comes to current DNA analysis methods, one almost needs an encyclopedia. We’ve seen a technological revolution in the last 5 years akin to what happened to computers in the last 20 years – things have been effectively turned on their head, and what took years and costs millions is now a simple single day procedure. There are machines for doing massive amounts of DNA sequencing, machines for targeting specific pieces of the genome which might be of interest and of course, there’s a machine for making coffee which keeps all the grad students going long into the night!
What has happened is an increase in power – the human genome is 6 billion base pairs long. Broken down, this works out approximately 3 billion base pairs for each set of chromosomes. As well as the chromosomes contained in the nuclei of the cell, each cell contains a second DNA strand – the DNA contained within mitochondria, which reside in the cell, and have several functions, including acting as the producers of chemical energy for the cell. Whilst our chromosomal DNA is a mixture from both of our parents, our mitochondria usually are inherited only from our mother and doesn’t recombine between generations – hence why ancestry studies of mitochondrial DNA have been able to trace deep genealogies into the human past.
So what does sequencing the human genome or human mitochondria tell us? And why bother with fancy machines which can generate tonnes of data? Well, when the draft human genome was produced, many thought it would be the beginning of the end – that by reading the DNA sequence like a book we could unlock all the mysteries of not only our genes, but why diseases such as cancer arise.
Unfortunately, science is often a complicated thing – what was found is that the human genome varies much more that expected, is much more complex than expected, and environmental influences over our lifetime have a huge impact on how the genetic code translates to proteins and cellular responses. Having a single human genome just wasn’t going to cut it. So scientists starting targeting bits of the genome which were known to be linked to cancer or other diseases looking for mutations. An example of this is the BRCA genes associated with breast cancers – mutations arising in this gene have been linked to risk factors associated with breast cancer.
The human genome varies from person to person and from disease to disease. Some of the variances can be used for tracing ancestry, some for identification such as DNA profiling, and of course for identifying genetic basis of disease. Just as many mutations may give rise to one cancer, there are many types of cancers. Furthermore, whilst some disorders and disease have clear mutations such as the BRCA mutations, Huntington’s disease or Down Syndrome, many don’t. Many are a complex mixture of both multiple mutations and environmental influences. These are the challenging disorders and are very hard to understand and include things such as cancer, schizophrenia and even how much genetics is behind the way we look.
So with the first human genome, scientists were just getting started, and the demand for cheaper, faster tools for generating genetic data was needed. The original draft of the human genome took several years, and about a billion dollars – to understand variance of different populations of people, of differences in health and disease, cheaper and quicker methods were desperately needed. And hence a revolution began. And now in 2011, we have sequencers capable of not only targeting small pieces of the human genome but also for generating entire genomes for analysis. One platform, the illumina HiSeq is capable of generating 600gigabses of DNA sequence per run. This takes several days, but this amount of data is equivalent to 100 human genomes. So considering that it took several years to sequence the first human genome, scientists have the capability to generate several human genomes for a few thousand dollars, and in a few days.
The illumina HiSeq: It may be just a large fancy looking box, but this machine can sequence
your genome several times over in a matter of days .
And this is just the beginning – whilst the $1000 genome is still a little while off, the cost is dropping rapidly. These technologies also have much greater sensitivity, which means that samples long archived, samples which have degraded and samples which were beyond the reach of tradtional analysis are now able to be studied and analysed. This opens up fields including ancient DNA, and allows greater information to be gained from the smallest and most degraded sources of DNA.
These new technologies pose their own unique challenges – scientists have to learn to manipulate massive amounts of data and this is a long way from the single DNA reads produced by traditional sequencing technology. Increasingly computer science is being utilised to not only utilise the large amounts of data generated by these platforms but also to assess the data and ensure it is reliable and accurate prior to it being analysed. This is incredibly important as whilst large amounts of data is great, it needs to be accurate if it is to be used to understand the human genome, and for future clinical diagnostics.
This new array of tools opens the door for scientists to gain a further understanding of what is now realised to be incredibly complex – the human genome in both health and disease. By reducing costs, improving output and addressing the need for genetic information which is quick to produce and reliable, we are on the way to understanding our origins, our fundamental genetic constitution and also building knowledge for the future. It’s not quite Gattaca, but these new tools are starting to reveal just who we are.