The Newly Sequenced Genome Bares All

February 15, 2001 at 11:14 am - More than 2 years ago

Come one, come all. The newly completed sequence of the human genome is now on parade for the whole world to see. Like a display of the storybook emperor’s new clothes, however, our genome’s finery may be as striking for what we don’t see as for what we do.

After the past few months of analysis at breakneck speed, two genome-mapping projects, one public and one private, have each pieced together the genetic fabric revealing the order of all the genes on our chromosomes. The International Human Genome Sequencing Consortium, a collective of several academic and government labs, and its rival in the private sector, Celera Genomics of Rockville, Md., published the sequences this week in the Feb. 15 Nature and Feb. 16 Science, respectively.

“I think that this is a race with no losers,” says Celera president, J. Craig Venter. “We were climbing a mountain to see what was on the other side.” Now, with the entire human genetic ensemble on view, scientists are poring over it in search of hints about evolutionary history, physiology, and disease vulnerability.

The two lynchpin papers in Science and Nature were accompanied by a total of 31 articles discussing implications of the data reported by the two groups and reporting early, independent explorations of the completed 3-billion-subunit sequence. The genome is built from permutations of four different chemical components, known as nucleotides.

Among the surprises was that the human genome sports only a third as many genes as most scientists expected. Foreign DNA, derived from ancient microbial invaders, also populates huge chunks of our genome.

A trove of information

Both Celera and the public consortium announced last June that they had read virtually all of the DNA in the human genome. At the time, much of the information was divided into short pieces that had to be assembled before analysts could comb it for gene-revealing patterns. Since then, assembly has been completed, and initial analyses provided the basis of this week’s announcements.

Now, for the first time, most of the genome can be read as a single massive text that can be studied with information tools like search engines.

Access to the data from the public consortium’s sequence is now freely available on the Internet. Celera also is offering some of its data for free, but the company charges subscription fees for more complete access.

In the Feb. 15 Nature, George M. Church of the Lipper Center for Computational Genetics in Boston reports that he found only minor differences when he compared the two sequences. “The thing I found most surprising is how similar they are. You could imagine using either version,” he says.

Neither version is flawless. “It’s important to point out that both of the sequences being presented here are still drafts,” says Robert H. Waterston, director of the Washington University Genome Sequencing Center in St. Louis, which is a member of the consortium. Small errors, akin to typos, dot the sequences, and some areas are still missing. He says, “More work needs to be done to produce [a] fully accurate sequence with no gaps. That work is already under way.”

Precious genes & silent passengers

Even with its gaps and typos, the new text of genetic instructions has instantly become a trove of information sine qua non for researchers. Eric S. Lander of the Whitehead Institute for Biomedical Research in Cambridge, Mass., notes that “the text is filled with long-sought answers, some amazing surprises, puzzling mysteries, and lots of useful information for medicine.”

One puzzle is how people, with our adaptable immune systems and nimble minds, can be constructed from what seems to be surprisingly few genes. Rather than the textbook figure of 100,000 genes — a number that some researchers still defend–the human genome appears to have about 30,000, says Lander. This is only about two to three times as many genes as are found in a fruit fly or nematode worm.

“There’s a lesson in humility in this,” Lander adds. “When we get past our wounded dignity, we realize there are some deep scientific questions to grapple with,” such as how people manage with such a modest complement of genes.

Human genes and the proteins they make are more complex than those in a fly or worm, says Eugene B. Koonin of the National Institutes of Health in Bethesda, Md. Human cellular machinery can splice proteins and modify them in many ways, so the typical human gene might have a hand in twice as many proteins as worm genes do. These genomic findings are bolstering the field of proteomics, whose investigators study organisms’ enormous ensembles of proteins.

Despite its apparently light load of genes, the human genome is large, a contrast that’s sure to fire up fields like comparative and evolutionary biology. Much of our DNA doesn’t contain human genes, according to both teams of researchers. It’s packed with the lurking remnants of snippets of ancient DNA that have replicated and inserted themselves into the genome many times over. This accounts for large chunks of duplicated sequences–often called junk DNA.

“It also seems that we really can’t take credit for all our genes,” remarks Lander. “The sequence tells us that we received more than 200 genes as gifts from bacteria that somehow infected a distant ancestor of ours and transferred some DNA.”

Hunting the genes that ail you

As details of the human genome help fill in our evolutionary past, scientists in every field of biology will search the new genetic text for ways to better our future.

“When we look at the text, we see dozens of disease genes that have already been found using the sequence and new drug targets already under study,” says Lander.

Scientists used to spend years looking under every rock to find a desired gene, he says. Now, with the human genetic code in a database that can fit on a CD, scientists can more easily search for all the genes involved in a disorder.

For example, in the next 5 years, researchers will begin finding genes that play a role in addiction, predicts Eric J. Nestler of the University of Texas Southwestern Medical Center in Dallas. In the past, researchers had found addiction-related “hot spots” along chromosomes, but they had not identified which genes in those regions might be the culprits. They didn’t even know what genes were in those regions, Nestler points out.

After researchers pinpoint the genes and the biochemical pathways involved in addiction, they’ll be better able to design effective treatments, says Nestler. Geneticists could also use the information to identify people at risk for addiction and focus prevention efforts on them.

Like addiction, many diseases, including cancers, involve multiple genes. The combination of these “susceptibility genes” make some people more or less vulnerable to a disease, says Victor A. McKusick of Johns Hopkins University in Baltimore. The new genomic data will help researchers to unveil such multiple-gene processes, he says.

The new data coming out of the human genome sequence could also benefit people who have sleep disorders or work the night shift. Using traditional methods, researchers have found eight “clock genes” that have a role in the body’s timekeeping mechanism, which regulates hormones, body temperature, and sleeping patterns, says Jonathan D. Clayton of the University of Leicester in England. A search of the human genome sequence has already turned up two other genes with sequences–and possibly functions–similar to some of the known clock genes, he says.

A mouse for every gene

The mouse genome has been almost entirely sequenced, too. The human and mouse genomes are about 85 percent identical, says Joseph H. Nadeau of Case Western Reserve University in Cleveland.

The mouse is “the closest you can get where you can genetically manipulate an organism and use that to guide our learning of human biology,” says Nadeau. “This will give us a whole lot of clues that you can’t get directly in humans.”

Nadeau is part of the International Mouse Mutagenesis Consortium, an effort to develop a mutant mouse strain for each of the 30,000 functional genes in the mouse genome. To date, researchers in the project have created roughly 5,000 mutant mouse strains carrying genes for diseases resembling human illnesses such as colorectal cancer and sickle cell anemia. Researchers will be able to explore whether the corresponding gene in people causes the same illness.

With all these genomic data piling up, one of the biggest challenges will be for biologists to learn how to manage them, says David S. Roos of the University of Pennsylvania in Philadelphia. Large data sets are relatively new to biology, he notes. Says Roos: “How to make [the information] accessible to biologists and how to educate computer scientists about the intellectual problems involved in biology are all important challenges.”