The ‘unknome’ catalogs nearly 2 million proteins. Many are mysterious

The new database could be used for finding ways to treat diseases or discovering drugs

An illustration of a DNA helix on a dark blue background.

The “unknome” database ranks human proteins by how little we know about them. Many proteins, and the genes that make them, aren’t well understood, and there’s still much to learn from the human genetic instruction book.

Richard Jones/Science Photo Library/getty images

When it comes to vast, under-explored frontiers, space and Earth’s oceans come to mind. But even in human bodies, there’s still much to be discovered. Meet the “unknome,” a new database that emphasizes how much we still don’t know about human genes and proteins.

The publicly available database ranks groups of proteins by how little is known about them. That information could help scientists identify proteins for future study, including for disease treatment and drug discovery, researchers report August 8 in PLOS Biology.

Cell biologist Sean Munro and colleagues compiled the unknome — a portmanteau of the words unknown and genome — to identify understudied but potentially important proteins and their corresponding protein-coding genes: DNA that copies a protein’s recipe into RNA (SN: 2/9/22).

Proteins are generally grouped into families that have a common evolutionary ancestor. The unknome database contains all protein families with at least one protein encoded by the human genetic instruction book, or genome, or by the genomes of 11 other commonly studied organisms. Over 13,000 groups and nearly 2 million proteins are included.

The unknome assigns a “knownness” score to each group of proteins based on how much is known about their corresponding genes. Some 3,000 of those groups, including 805 that contain at least one human protein, have a knownness score of zero, showing there’s still much to learn within the human genome (SN: 3/31/22).

Munro and colleagues used the database to study 260 genes that are shared between fruit flies and humans and that have low knownness scores. After dialing down the activity of each of the protein-coding genes in the flies, the researchers found that about 60 were essential for life. Others were important for reproduction, growth, movement and resilience against stress.

“Even in really well-studied [organisms] like flies, there are new things to be found,” says Munro, of the Medical Research Council Laboratory of Molecular Biology in Cambridge, England.

Whether some or all of those genes have similar effects in humans is still unknown. But the database could help researchers tease out important human proteins by quickly screening similar proteins in more easily studied organisms like fruit flies, says data scientist Tudor Oprea of Expert Systems Inc., a drug discovery company in San Diego, who was not involved in the study.

Munro says the next step for his group is to work with similar efforts like the Understudied Proteins Initiative for a large-scale study of these mysterious proteins.

Skyler Ware was the 2023 AAAS Mass Media Fellow with Science News. She has a Ph.D. in chemistry from Caltech, where she studied chemical reactions that use or create electricity.

More Stories from Science News on Genetics