We finally have a fully complete human genome

About 8 percent of the genome was missing from earlier versions of the genetic instruction book

rows of puzzle pieces with the letters A, T, C and G and one section highlighted

New technologies that allow scientists to put DNA bases — represented by the letters A, T, C and G — in order have helped researchers put together one of the world’s most complex puzzles, a complete human genome

Ernesto del Aguila III/NHGRI

Researchers have finally deciphered a complete human genetic instruction book from cover to cover.

The completion of the human genome has been announced a couple of times in the past, but those were actually incomplete drafts. “We really mean it this time,” says Evan Eichler, a human geneticist and Howard Hughes Medical Institute investigator at the University of Washington in Seattle.

The completed genome is presented in a series of papers published online March 31 in Science and Nature Methods.

An international team of researchers, including Eichler, used new DNA sequencing technology to untangle repetitive stretches of DNA that were redacted from an earlier version of the genome, widely used as a reference for guiding biomedical research.  

Deciphering those tricky stretches adds about 200 million DNA bases, about 8 percent of the genome, to the instruction book, researchers report in Science. That’s essentially an entire chapter. And it’s a juicy one, containing the first-ever looks at the short arms of some chromosomes, long-lost genes and important parts of chromosomes called centromeres — where machinery responsible for divvying up DNA grips the chromosome.

“Some of the regions that were missing actually turn out to be the most interesting,” says Rajiv McCoy, a human geneticist at Johns Hopkins University, who was part of the team known as the Telomere-to-Telomere (T2T) Consortium assembling the complete genome. “It’s exciting because we get to take the first look inside these regions and see what we can find.” Telomeres are repetitive stretches of DNA found at the ends of chromosomes. Like aglets on shoelaces, they may help keep chromosomes from unraveling.

Data from the effort are already available for other researchers to explore. And some, like geneticist Ting Wang of Washington University School of Medicine in St. Louis, have already delved in. “Having a complete genome reference definitely improves biomedical studies.… It’s an extremely useful resource,” he says. “There’s no question that this is an important achievement.”

But, Wang says, “the human genome isn’t quite complete yet.”

To understand why and what this new volume of the human genetic encyclopedia tells us, here’s a closer look at the milestone.

What did the researchers do?

Eichler is careful to point out that “this is the completion of a human genome. There is no such thing as the human genome.” Any two people will have large portions of their genomes that range from very similar to virtually identical and “smaller portions that are wildly different.” A reference genome can help researchers see where people differ, which can point to genes that may be involved in diseases. Having a view of the entire genome, with no gaps or hidden DNA, may give scientists a better understanding of human health, disease and evolution.

The newly complete genome doesn’t have gaps like the previous human reference genome. But it still has limitations, Wang says. The old reference genome is a conglomerate of more than 60 people’s DNA (SN: 3/4/21). “Not a single individual, or single cell on this planet, has that genome.” That goes for the new, complete genome, too.  “It’s a quote-unquote fake genome,” says Wang, who was not involved with the project.

The new genome doesn’t come from a person either. It’s the genome of a complete hydatidiform mole, a sort of tumor that arises when a sperm fertilizes an empty egg and the father’s chromosomes are duplicated. The researchers chose to decipher the complete genome from a cell line called CHM13 made from one of these unusual tumors.

That decision was made for a technical reason, says geneticist Karen Miga of the University of California, Santa Cruz. Usually, people get one set of chromosomes from their mother and another set from their father. So “we all have two genomes in every cell.”

If putting together a genome is like assembling a puzzle, “you essentially have two puzzles in the same box that look very similar to each other,” says Miga, borrowing an analogy from a colleague. Researchers would have to sort the two puzzles before piecing them together. “Genomes from hydatidiform moles don’t present that same challenge. It’s just one puzzle in the box.”

The researchers did have to add the Y chromosome from another person, because the sperm that created the hydatidiform mole carried an X chromosome.

Even putting one puzzle together is a Herculean task. But new technologies that allow researchers to put DNA bases — represented by the letters A, T, C and G — in order, can spit out stretches up to more than 100,000 bases long. Just as children’s puzzles are easier to solve because of larger and fewer pieces, these “long reads” made assembling the bits of the genome easier, especially in repetitive parts where just a few bases might distinguish one copy from another. The bigger pieces also allowed researchers to correct some mistakes in the old reference genome.

What did they find?

For starters, the newly deciphered DNA contains the short arms of chromosomes 13, 14, 15, 21 and 22. These “acrocentric chromosomes” don’t resemble nice, neat X’s the way the rest of the chromosomes do. Instead, they have a set of long arms and one of nubby short arms.

The length of the short arms belies their importance. These arms are home to rDNA genes, which encode rRNAs, which are key components of complex molecular machines called ribosomes. Ribosomes read genetic instructions and build all the proteins needed to make cells and bodies work. There are hundreds of copies of these rDNA regions in every person’s genome, an average of 315, but some people have more and some fewer. They’re important for making sure cells have protein-building factories at the ready.

“We didn’t know what to expect in these regions,” Miga says. “We found that every acrocentric chromosome, and every rDNA on that acrocentric chromosome, had variants, changes to the repeat unit that was private to that particular chromosome.”

By using fluorescent tags, Eichler and colleagues discovered that repetitive DNA next to the rDNA regions — and perhaps the rDNA too — sometimes switches places to land on another chromosome, the team reports in Science. “It’s like musical chairs,” he says. Why and how that happens is still a mystery.

The complete genome also contains 3,604 genes, including 140 that encode proteins, that weren’t present in the old, incomplete genome. Many of those genes are slightly different copies of previously known genes, including some that have been implicated in brain evolution and development, autism, immune responses, cancer and cardiovascular disease. Having a map of where all these genes lie may lead to a better understanding of what they do, and perhaps even of what makes humans human.

One of the biggest finds may be the structure of all of the human centromeres. Centromeres, the pinched portions which give most chromosomes their characteristic X shape, are the assembly points for kinetochores, the cellular machinery that divvies up DNA during cell division. That’s one of the most important jobs in a cell. When it goes wrong, birth defects, cancer or death can result. Researchers had already deciphered the centromeres of fruit flies and the human 8, X and Y chromosomes (SN: 5/17/19), but this is the first time that researchers got a glimpse of the rest of the human centromeres.

The structures are mostly head-to-tail repeats of about 171 base pairs of DNA known as alpha satellites. But those repeats are nestled within other repeats, creating complex patterns that distinguish each chromosome’s individual centromere, Miga and colleagues describe in Science. Knowing the structures will help researchers learn more about how chromosomes are divvied up and what sometimes throws off the process.

microscope image of red chromosomes and green microtubules inside dividing Hela cells
Researchers have now deciphered the structure of all human centromeres — the pinched-in portions of chromosomes (red in this image of Hela cells dividing) where structures called microtubules (green) attach and tug, ensuring proper distribution of DNA in cells.Matthew Daniels/Wellcome Collection (CC BY 4.0)

Researchers also now have a more complete map of epigenetic marks — chemical tags on DNA or associated proteins that may change how genes are regulated. One type of epigenetic mark, known as DNA methylation, is fairly abundant across the centromeres, except for one spot in each chromosome called the centromeric dip region, Winston Timp, a biomedical engineer at Johns Hopkins University and colleagues report in Science.

Those dips are where kinetochores grab the DNA, the researchers discovered. But it’s not yet clear whether the dip in methylation causes the cellular machinery to assemble in that spot or if assembly of the machinery leads to lower levels of methylation.

Examining DNA methylation patterns in multiple people’s DNA and comparing them with the new reference revealed that the dips occur at different spots in each person’s centromeres, though the consequences of that aren’t known.

About half of genes implicated in the evolution of humans’ large, wrinkly brains are found in multiple copies in the newly uncovered repetitive parts of the genome (SN: 2/26/15). Overlaying the epigenetic maps on the reference allowed researchers to figure out which of many copies of those genes were turned on and off, says Ariel Gershman, a geneticist at Johns Hopkins University School of Medicine.

“That gives us a little bit more insight into which of them are actually important and playing a functional role in the development of the human brain,” Gershman says. “That was exciting for us, because there’s never been a reference that was accurate enough in these [repetitive] regions to tell which gene was which, and which ones are turned on or off.”

What is next?

One criticism of genetics research is that it has relied too heavily on DNA from people of European descent. CHM13 also has European heritage. But researchers have used the new reference to discover new patterns of genetic diversity. Using DNA data collected from thousands of people of diverse backgrounds who participated in earlier research projects compared with the T2T reference, researchers more easily and accurately found places where people differ, McCoy and colleagues report in Science.

The Telomere-to-Telomere Consortium has now teamed up with Wang and his colleagues to make complete genomes of 350 people from diverse backgrounds (SN: 2/22/21). That effort, known as the pangenome project, is poised to reveal some of its first findings later this year, Wang says.

McCoy and Timp say that it may take some time, but eventually, researchers may switch from using the old reference genome to the more complete and accurate T2T reference. “It’s like upgrading to a new version of software,” Timp says. “Not everyone is going to want to do it right away.”

The completed human genome will also be useful for researchers studying other organisms, says Amanda Larracuente, an evolutionary geneticist at the University of Rochester in New York who was not involved in the project. “What I’m excited about is the techniques and tools this team has developed, and being able to apply those to study other species.”

Eichler and others already have plans to make complete genomes of chimpanzees, bonobos and other great apes to learn more about how humans evolved differently than apes did. “No one should see this as the end,” Eichler says, “but a transformation, not only for genomic research but for clinical medicine, though that will take years to achieve.”

More Stories from Science News on Genetics