Mining electronic records yields connections between diseases

Data integration technique could help researchers find missing links among medical conditions

Danish scientists have devised a new way to connect the dots between diseases. Integrating data mining that extracts information from clinicians’ notes with protein and genetic information can reveal connections between health problems as seemingly unrelated as migraines and hair loss, or glaucoma and a hunching back, researchers report August 25 in PLoS Computational Biology.

A RECORDS ACHEIVEMENT A network depicting patients’ health problems (colored dots) reveals overlapping conditions, including known connections such as diabetes (light orange, numbered 26 at top) and hypertension (dark green, numbered 72, just to the right). Roque et al/PLoS Computational Biology 2011

Besides generating new leads about the molecular workings of disease, the approach is also revealing a much richer portrait of each patient, says study coauthor Søren Brunak of the Center for Biological Sequence Analysis at the Technical University of Denmark in Lyngby and the University of Copenhagen.

Using the World Health Organization’s codes for classifying diseases, researchers generated a map that linked more than 4,700 patients at Denmark’s largest psychiatric hospital by their diagnoses. The team integrated these data with information gleaned from a text mining algorithm that combed through 10 years’ worth of clinicians’ notes  — an average of 25,000 words per patient.

More than 800 pairs of health problems turned up more than twice as often as expected by chance. Ninety-three of those pairs were then flagged by a doctor as being especially intriguing. Investigations into the genes and proteins associated with some of these unusual pairs revealed previously unknown connections, such as overlapping molecular machinery or pathways.

For example, the team identified nine patients diagnosed with both migraine and alopecia, or hair loss. The researchers discovered a potential cellular target of a protein that had already been implicated in hair loss by investigating the protein’s connection to migraines. In addition, the scientists realized that the gluten allergy known as celiac disease has been associated with hair loss and migraines — and also has been linked to schizophrenia.

Brunak and his colleagues say they have yet to draw major conclusions about the implicated proteins and mechanisms.

In many places, including the United States, medical codes are used mostly for billing and reimbursement and they may relate only to the current hospital visit. The notes clinicians make are a much richer resource but might not be read by other clinicians pressed for time. Integrating these notes with the codes reveals much more about the patient’s history and condition, says Brunak.

“In a split second you get an idea about where that patient is in treatment,” he says. As more individualized genetic data become available, that patient information will be even richer, further personalizing medicine, he adds. “In the end what we hope for is to approach it from both ends — the patient’s records and genomic data.”

Clinical notes are a huge source of information, says Stéphane Meystre, a specialist in biomedical Informatics at the University of Utah in Salt Lake City. “This approach clusters information in a much more detailed way.”

It may be decades before a really personalized approach becomes the norm, Meystre says. Electronic record keeping hasn’t been widely adopted, he says, and knowing about an underlying molecular link doesn’t mean a treatment is available. But efforts like the current approach are already launching hypotheses about diseases and treatments.

More Stories from Science News on Tech

From the Nature Index

Paid Content