- :: Atom & Cosmos
- :: Body & Brain
- :: Earth
- :: Environment
- :: Genes & Cells
- :: Humans
- :: Life
- :: Matter & Energy
- :: Molecules
- :: Science & Society
- :: Other Topics
- :: Science News For Kids
Geneticists managed to bag a few trophies anyway—genes
for Huntington’s chorea and cystic fibrosis, for example—mostly
in rare diseases caused by a problem in a single, high-powered gene.
Unfortunately, most of the more common diseases, such as type II diabetes, are
instead controlled by a whole crowd of gene variants, each playing a small and
often subtle role in the path to disease.
To spot these quiet genes lying in the genomic underbrush,
disease geneticists realized they’d better try a new tack. In the mid-1990s,
the most foresighted among them asked, “What if someday we could take a bunch
of unrelated people and compare their genetic blueprint in lots of different places,
all at once? Could it revolutionize the study of human disease?” International
partnerships soon formed to figure out if this was even possible.
It was. Now, new technology, buttressed by new analytical
methods and enhanced knowledge of the genome, allows scientists to do just
that: Researchers can test up to a million of the most important spots across
the entire genome at one time. These “genome-wide association” studies excel at
detecting the subtle effects from common versions or variants of genes that
went unnoticed before. Researchers can now put their guesswork aside and watch
as a single study hauls in thousands of potential gene suspects.
Not surprisingly, geneticists are cheered by the prospect of
leaving behind their days of hapless gene-hunt bumbling. “After years as
‘Keystone Cops,’ complex-trait geneticists can now find culprits not previously
suspected and establish guilt beyond a reasonable doubt,” geneticists David
Altshuler and Mark Daly of the Broad Institute in Cambridge, Mass., wrote last
July in Nature Genetics.
In the past two years alone, genome-wide association studies
have found about 100 new genetic variants linked to 40 common diseases,
including type II diabetes, prostate cancer and heart disease. These studies
point to genes that researchers never suspected of being involved with certain
diseases, or to uncharted regions known as “gene deserts” where genes are not
known—at
least yet—to exist.
Researchers hope the new studies will help explain how
common diseases develop and also will help guide the search for new treatments
and drugs. “I think it is absolutely clear that we have learned a tremendous
amount about a whole range of complex, common, genetic diseases in the human
population, and we have much greater knowledge than we did just a very short
time ago,” says biostatistician Michael Boehnke of the
Ironically, some of these problems stem from the studies’
biggest strength: an unprecedented avalanche of data. Other challenges arise
from the lingering genetic effects of migrations out of
As results from huge new studies roll in, these challenges
are attracting more attention. The National Institutes of Health held a special
meeting in March to discuss how to translate genome-wide association data into
clinical research and practice. And scientific journals are publishing special
papers instructing researchers in the art of interpreting genome-wide
association studies.
“First you have to go through a sifting process, filtering
the true signals from the false signals and making sure you don’t miss any,”
says epidemiologist Muin Khoury of the Centers for Disease Control and
Prevention in Atlanta. “I think this is as much of an art as a science right
now.”
Deluge of data
For all the apparent variation among people, the human
genetic code is actually 99.5 percent identical from person to person. That
remaining individualistic half percent can help explain how diseases develop in
some people and not others. Genome-wide association study researchers rely on
results from two big projects to guide them to these crucial areas.
The first project, the government-sponsored Human Genome
Project, analyzed a human genome archetype and transcribed the 3.2 billion
nucleotide “genetic letters” that make up human DNA. Using this framework, the
nonprofit International HapMap Project is pinpointing the 11 million specific
sites along the genome where genetic information differs by a single letter.
About 4 million sites have been cataloged so far.
Usually these one-letter sites, called single nucleotide
polymorphisms, or SNPs (pronounced “snips”), do not themselves cause disease. But
the SNPs often lie near important genes that can. So the SNPs serve as
convenient signposts—pointing researchers to important disease-related
genes in the neighborhood.
To find SNPs linked with a certain disease, the simplest
approach is to compare groups of volunteers side by side. Researchers recruit a
group of breast cancer patients, say, and a group of similar people who are
breast cancer-free. The researchers use “SNP chips”—microchips
that test up to 1 million selected SNPs at once—and record the versions of each
SNP that each person possesses.
Then researchers statistically compare SNPs in the groups.
If most breast cancer patients had two “T” versions of the SNP known as
ESR1002, for example, and most disease-free volunteers had two “G” versions,
then researchers would flag ESR1002 as a possible breast cancer suspect.
Further investigation might then point to an important gene nearby.
Yet a million SNPs on a chip still means a million potential
suspects to sift through—most of which are ultimately
not related to the disease. The flood of information is potentially
overwhelming. As the title of a New
England Journal of Medicine editorial last summer described it, genome-wide
association studies are like “drinking from the fire hose.”
In fact, most statistical methods were built to deal with
data scarcity, not to handle a data deluge. So when a genome-scan
delivers its data—four to five thousand times more information than in
traditional epidemiology studies—standard statistical methods
can easily choke.
For example, a genome-wide association study of 1 million
SNPs will flag about 50,000 SNPs as significant. But most will be false alarms,
indistinguishable from real results. Worse yet, truly interesting SNPs may be
ignored and never get flagged in the first place.
The problem lies in how results get flagged. Statistical
methods essentially set a cutoff value that any result must surmount before
being flagged as significant —a statistical hurdle, in a
sense. Traditional hurdles do a good job of separating true results from bogus
ones when there aren’t many competitors in the race. But in a million-SNP
blitz, too many false results manage to scramble over the hurdle just by random
luck.
“There have been problems in the past when people have
declared victory prematurely,” says geneticist Joel Hirschhorn of the Broad
Institute, by declaring SNPs to be significant based only on the traditional
statistical hurdles. “It was hard to convince people that [the old level] was
not an appropriate threshold. People are starting to accept that now.”
The simplest solution is just raising the hurdle.
Traditionally, researchers have permitted a bogus result to sneak through about
1 time in 20. With the new genome-wide scans, it’s now usually no more than 5
in 100 million. This raises the bar considerably, Hirschhorn says.
Statistical demands
But higher hurdles require bigger studies. That’s because
much of the muscle power behind these studies depends on how many participants
are included. New studies need an extra boost of muscle to hoist the important
SNPs over the now-higher bar—otherwise no SNP might get
flagged. So researchers have to scramble to find money and volunteers. Typical
sample sizes for genetic association studies can now run in the tens of
thousands.
Even then, added muscle power might not be enough. So
researchers are turning to multistage studies, too. In such studies scientists
first scan the full genome, then try to replicate the strongest findings with
new subjects in subsequent studies. “That really leads to a new type of
epidemiology, because basically no one study is remotely definitive,” says
epidemiologist David Hunter of the Harvard School of Public Health in
Still, data-sharing is becoming easier. For example,
researchers who conduct genome-wide association studies funded through the
National Institutes of Health must now deposit their data into a common
database for immediate access.
Surprisingly, the earlier that researchers collaborate, the
better—at least from a statistical point of view, Hunter
says. Researchers originally thought that if small, independent groups each did
their own study and then compiled a running list of all the important SNPs from
their results, everything would be fine.
Not so, Hunter says. It turns out that the running list of
results will still miss important SNPs. It’s better for those groups to pool
all their subjects at the beginning and run one big scan, he says. The final
list from the pooled study will be more accurate and complete than the running
list from independent studies.
The reason, Hunter says, is that subtle genetic effects
(such as those likely to contribute to diseases) can be picked up only with a
sufficiently large sample size—in the same way that a larger
magnifying glass is needed to spot the smaller bugs in the undergrowth. “So
this will be a long-running story for common diseases,” he says, “because as we
put together more and more scans, we’ll find more and more truly associated
variants.”
Also afflicting these multistage studies is the peculiar
“winner’s curse” phenomenon, in which top results in small initial studies
don’t always pan out in later studies. This is a close cousin of the “Sports Illustrated curse,” in which star
rookies featured on the magazine’s cover end up with a crash-and-burn second
season.
There’s a simple statistical explanation, says
epidemiologist Teri Manolio of the National Human Genome Research Institute of
the National Institutes of Health in Bethesda, Md. Researchers will naturally
try to replicate the most extreme top-scoring results in an initial study. But
these huge effects probably owe their super-high ranking in part to a true
effect and in part to sheer random luck. Small follow-up studies—designed
to look for these big effects—will miss the more subtle, true
effects, Manolio says.
Thus initial studies may appear flawed, even if they aren’t.
The solutions—increasing sample sizes and recognizing that extreme
initial results are likely overinflated—are beginning to take hold.
“It’s happening,” Manolio says, “but it’s happening slowly.”
The trouble with
ancestry
Complications from race and ancestry can also play a role in
genome-wide association studies. That’s because people with European, Asian and
African ancestries have different genetic patterns. These patterns can be
misleading. “There is a big debate about this in the genetics community,” says
geneticist Eric Jorgenson of the
Take a simplified example: Suppose most people of European
ancestry in a sample had blue eyes and also happened to have disease X, while
most people of Asian ancestry were brown-eyed and disease-free. A naïve
analysis might conclude that the blue-eyes SNP is responsible for disease X,
even if eye color and disease are completely unrelated.
That is, the methods are likely to nab the wrong SNP
suspects, simply because these innocent SNPs tend to show up in the same
situations as truly guilty SNPs. This genetic-mixing issue shows up in other
kinds of studies, too. But it’s a particular problem for studies of the entire
genome because of the huge number of ancestry-related SNPs being tested.
Traditionally, researchers have addressed this
genetic-mixing problem largely by balancing the number of study volunteers
belonging to different racial groups. But this strategy goes only so far,
Jorgenson says. Genetic heritage is more complicated than skin color or
grandparents’ birthplace, and the ancestral variation in the gene pool can’t be
conveyed with a simple check-off survey box.
Some nifty statistical tricks, however, can help researchers
spot and fix this problem in their analyses, Jorgenson says. For example, one
method comes up with a mathematical summary of every volunteer’s personal
genetic ancestry and incorporates that into the analysis. This effectively
allows researchers to “strip” each volunteer of his or her genetic ancestry and
simply investigate the important genetic patterns that are left over.
Ancestry can cause other problems. Waves of migration out of
These conditions—founder effects and bottleneck
populations—meant that new emigrant groups had less genetic
diversity than the original African population. Over time, these effects became
more pronounced. People with recent African ancestry now have more variability
across their genome than do people with European and Asian ancestry.
Problems arise when people with different genetic ancestries
are included in one study, Jorgenson says. Scanning a group with greater
genetic variability requires more refined tools. “If you’re applying
genome-wide association studies to a bottleneck population with less
variability, you can use a wider-tooth comb,” he says. “Populations with more
variability need a finer-tooth comb.” Current methods may miss disease-linked
SNPs in African-Americans, especially if the SNPs are associated with rare gene
variants.
The ideal solution would be to sequence every letter of
volunteers’ genomes—thus providing the finest-toothed comb possible.
Cost and logistics are still prohibitive for this approach, however. Still, the
more SNPs that manufacturers can squeeze onto their SNP chips, the more likely
that important SNPs will be caught, Jorgenson says. And some manufacturers are
already starting to design chips that incorporate sets of SNPs suitable for
different genetic ancestries.
Beyond SNPs
Genome-wide association studies might indeed prove to be a
bonanza for modern gene hunters. But in all the excitement, researchers
shouldn’t forget the value of good old-fashioned study design, Khoury warns. “I
think people are being lulled into a zone of comfort,” he says, as some
researchers rely on million-SNP chips, large sample sizes and multiple
replication studies to cover up study flaws.
And there’s still a nagging question: After you’ve bagged
your gene, what do you do? “To me, this is the biggest stumbling block,” Khoury
says. “You still have to work out the biology of that hit.... That’s actually
where the hard work begins.”
It’s clear that clinical applications are still years away,
Manolio says. Some companies are starting to sell personalized genetic tests
based on results from genome-wide association studies. But researchers hardly
know what the study results mean themselves; any immediate translation into
personalized medicine will naturally be problematic.
“There is a lot of missing heritability in our results right
now,” Boehnke says. “If your goal [with these studies] is personalized medicine
and developing your own personal genetic report card, we’re definitely not
there yet. I don’t know whether we ever will be.”
Regina Nuzzo is a
freelance writer based in Washington, D.C.
Found in: Biomedicine and Genes & Cells
- General facts about genome-wide association studies for the lay public by NIH:
link - Guilt by Association: Whole-genome scans yield disease clues
- SNPs Ahoy! Scientists complete map of genetic differences
- Altshuler, D., and Mark Daly. 2007. Guilt beyond a reasonable doubt. Nature Genetics 39(July 1):813-815. doi: 10.1038/ng0707-813
- Hunter, D.J., and P. Kraft. 2007. Drinking from the fire hose-—statistical issues in genomewide association studies. New England Journal of Medicine 357(Aug. 2):436-439. 10.1056/NEJMp078120
- Pearson, T.A., and T.A. Manolio. 2008. How to interpret a genome-wide association study. Journal of the American Medical Association 299(March):1335-1344. Available at link.
- Hirschhorn, J.N. and M.J. Daly. 2005. Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics 6(February):95-108.
- Jorgenson, E., and J.S. Witte. 2007. Genome-wide association studies of cancer. Future Oncology 3(August):419-427.
- Kruglyak, L. 2008. The road to genome-wide association studies. Nature Reviews Genetics 9(April):314-318.
- Risch, N., and K. Merikangas. 1996. The future of genetic studies of complex human diseases. Science 273(Sept. 13):1516-1517.
- NIH Policy on Genome-wide Association Studies (GWAS)
link

Even though today at SNPedia our community is mostly focused on reporting all the published disease associations for single SNPs, we are gearing up for tomorrow, when publications will shift towards reporting the sets of SNPs and sets of haplotypes that influence disease risk when present together in an individual.