Advertisement
Advertisement

Focus: Venter’s deep dive into the human genome

Sequencing may be the largest of its kind in quality, quantity

Share

In what appears to be the single most in-depth look at human DNA to date, the complete genetic sequences of more than 10,000 people have been deeply examined in a study led by genomics pioneer Craig Venter.

Available in preliminary form, the study presents a finely-grained map of the differences in individual genomes and weak spots where mutations are especially likely to cause disease.

The study of 10,545 genomes also indicates that there’s much further to go in understanding all the variations in DNA sequences contained in the global population of more than 7 billion people. Some variations are linked to ethnicity, others are the result of mutations in that individual. Even the length of the genome varies slightly from person to person.

As more genomes are sequenced and linked to health status of the individuals, the more useful the genomic information becomes. This is especially important in finding the genetic causes of rare diseases.

Although the study has not yet been published in a peer-reviewed journal, other scientists who have read it praised its contributions and said they are looking forward to its formal publication. It can be found in its pre-publication form at https://bit.ly/jcvideep.

Among the study’s highlights:

--- 84 percent of an individual genome can be sequenced confidently. This contains 92.5 percent of known disease-causing variations.

--- Each newly sequenced genome contains an average of 8,579 previously unknown variations, which may or may not be involved in disease.

--- In addition, each genome contains about 700,000 DNA “letters” that aren’t found in a standard reference genome.

“Our work represents the largest effort to date in sequencing human genomes at deep coverage with these new standards," the study stated. "This study identifies over 150 million human variants, a majority of them rare and unknown.

“Moreover, these data identify sites in the genome that are highly intolerant to variation – possibly essential for life or health. We conclude that high coverage genome sequencing provides accurate detail in human variation for discovery and for clinical applications.”

Scientists from La Jolla's Human Longevity Inc. and the nonprofit J. Craig Venter Institute, conducted the study. Venter was the senior author. The first authors were Amalio Telenti, Levi C.T. Pierce and William H. Biggs; all of Human Longevity.

The study was made available July 1 on BioRxiv, a Web site operated by Cold Spring Harbor Laboratory in New York.

BioRxiv is a pre-print server that allows a first look at studies in the fast-evolving fields of life sciences. Scientists may also make suggestions that could be incorporated to improve the published study. A similar service has long been available to physicists, but is new for life sciences.

What is BioRxiv

Quality and quantity

Venter declined a request for an interview because the peer review and publication process has not been completed. But other scientists who had read the study say they are impressed, providing an informal peer review.

"Wow," was the July 6 Twitter reaction of Dr. Atul Butte, who leads the UC San Francisco Institute for Computational Health Sciences.

Contacted to elaborate, Butte said the scale of the effort is "pretty impressive," along with the quality of the genomes.

Each of the more than 10,000 genomes, of which 8,096 were from unrelated individuals, was sequenced from 30 to 40 times to reduce errors, the study stated. This "deep sequencing" as it is known, is enough for a medical-grade genome, sufficiently reliable for clinical use.

Some of the genomes contained medically important variants, and this study shows the level of quality needed to truly find or dismiss the presence of these variants in patients, Butte said. Moreover, the genes were considered in combination.

"I liked how they looked at gene families in detail," Butte said. "I thought it was a really innovative look."

For clinical use, the genomes provide a useful reference set, detailing variations that can be compared to future individual genomes, Butte said.

"That helps us with every cancer case, that helps us with every child that comes in with a rare genetic syndrome," he said.

Hudson Freeze, who researches rare diseases at the Sanford Burnham Prebys Medical Discovery Institute in La Jolla, said the Venter team's approach resembled a real-life version of the fictional TV drama CSI.

"They'll have a blurry picture and say, can you enhance that? And somehow, through magic, they can read when the license tag was last replaced," said Freeze, director of SBP's Human Genetics Program. "That's kind of what they've done here."

Rare diseases

Freeze said the study is the first he's aware of to sequence so many genomes at such a high quality. That high-quality high-quantity output will become more important to track down the causes of rare genetic diseases.

Freeze said he accepted the study's premise that delivering individualized medicine, such as in President Obama's Precision Medicine Initiative, requires good genomic maps and an understanding of what regions can or cannot vary without affecting health.

"This idea of taking more than 10,000 individual genomes and doing this -- that takes a lot of guts," Freeze said. "It's going to be a guy like Craig Venter, who has the moxie, the finances and the know-how to get this done."

In his own research, Freeze studies rare diseases of glycosylation, the metabolism of sugars. He and colleagues also work with patients and families seeking answers for their diseases. Just before an interview, he dealt with such a case.

"Five minutes ago, we were dealing with the possible analysis of a mutation in a gene that had not been known before to cause disease, yet it falls with my glycosylation focus," Freeze said.

In the first patient, a genetic deficiency was confirmed through a functional test. The second patient also had a mutated gene, and the question was whether this mutated gene was the cause, or a false alarm.

A rare disease database indicated the presence of three mutations in the gene of interest. Mutations are often recessive, so someone who is heterozygous, with one mutated gene from one parent and one normal gene from the other parent, doesn't have the disease.

"There are 32 carriers out there of 60,000 that have been examined," he said. "But those carriers don't tell you anything other than it's possible," because they are all heterozygous.

The second patient turned out to be homozygous, with both parents contributing a copy of the mutated, damaged gene.

This indicates that the mutations may well be the cause of the disease, enough information for applying the functional test for that particular rare disease.

Numerous, high-quality genome maps such as those from the study will assist researchers and clinicians studying such rare genetic diseases to narrow down the possibilities, he said.

"If you don't have this kind of in-depth information, you don't even know where to start," Freeze said.

Human DNA contains about 3 billion “letters,” of adenine, cytosine, guanine, thymine, or A, C, G and T. A misspelling of one DNA letter may be enough to cause a genetic disease, or it may be harmless. So near-perfect accuracy of genome reading is desirable for medical use.

UC San Francisco's Butte said it wasn't surprising that 16 percent of individual genomes couldn't be reliably sequenced. Some areas of the genome are harder to sequence than others due to their repetitive nature.

Human genomes are too large to read in one pass. Under the "shotgun" method championed by Venter, the genomes are broken into many pieces, the individual pieces sequenced, and then reassembled by computer. Repetitive sequences, like identically shaped puzzle pieces, are difficult to place in the correct sequence.

E unum pluribus genome

Companies led by San Diego’s Illumina have dramatically driven down the cost and increased the capacity of genome sequencing in recent years. This made it possible for Venter to scale up genome examination, both in quality and quantity.

The international drive to sequence the human genome, the Human Genome Project, began in the late 1980s. It was led by the U.S. Department of Energy with $3 billion in funding. Later, Venter launched a competing privately funded project. A rough draft of the human genome was completed in 2001, the culmination of the $3 billion project.

The excitement marking that accomplishment dimmed over the years as researchers realized that one reference genome didn’t say very much in itself. A reference genome doesn’t exist in nature, it is an abstraction. In reality there is no one human genome. Each person’s genome contains numerous variations, and the significance of most of these is unknown.

It's also impractical to sequence individual genomes the way the reference genome was created, Butte said.

"When you sequence a new individual, you don't have the resources of the world to help you, like you did with the original reference," Butte said. "The idea here is that any one good lab might actually be able to get to 84 percent reasonably easily."

Ali Torkamani, Director of Genome Informatics at the Scripps Translational Science Institute, said by email that the study extended knowledge of areas of the genome that don't code for proteins. Torkamani is a shareholder in Venter's Human Longevity.

Only a few percent of the genome, known as the "exome," consists of genes that make proteins or are otherwise expressed. In the vast area remaining, some sequences regulate genes and perform other critical functions.

"This study provides a more complete picture of all potentially clinically important regions in the genome based on an observed lack of genetic variation," Torkamani said.

The study's biggest limitation, "at least as presented here and made available to external researchers," is the lack of physical health data, Torkamani said.

For a followup, Torkamani suggested the researchers examine other variations in the genome beyond the study's focus on single nucleotide polymorphisms - or single letter changes in the genome.

"While SNPs are the most abundant type of genetic variation in the human genome, there are other types of genetic variants that are clinically relevant but not discussed in this study - most notably insertions, deletions, copy number variants, and other types of structural variants," Torkamani said.

"A follow-up interrogating these other types of genetic variation would be very interesting and informative to the genetics community. Associations of genetic variants with phenotypic data, to the extent that it is possible, would also be an interesting follow-up."