Mining genealogy databases to find crime suspects raises privacy concerns
Rich Pedroncelli/Associated Press
Using DNA to find a killer sounds easy: Upload some DNA to a database, get a match and — bingo — suspect found. But it took new genetic sleuthing tools to track down the man suspected of being the Golden State Killer.
Investigators have confirmed they used a public genealogy database, GEDmatch, to connect crime scene evidence to distant relatives of Joseph James DeAngelo. The 72-year-old former police officer, arrested April 24 at his home in Sacramento, is suspected in a string of about 50 rapes and 12 murders committed between 1974 and May 1986.
The news prompted a flurry of concerns about privacy and ethics — there’s no telling how many people in the public database are being subjected to what amounts to a “genetic stop and frisk,” says Alondra Nelson, a sociologist at Columbia University. But others say they doubt police are actively trolling genealogy websites for suspects. Too many resources are required to do this sort of work, says Sara Katsanis, a genetics policy scholar at Duke University’s Initiative for Science & Society. “I don’t think this is going to become commonplace.”
Police haven’t yet publicly detailed the methods that led them to DeAngelo. Yet DNA experts say the simple upload-and-match scenario wouldn’t have worked in this case. DeAngelo’s DNA wasn’t in any police database. And snippets collected from crime scenes aren’t in the same form as the DNA in genealogy sites. In addition, consumer testing companies don’t participate in criminal investigations without a warrant. Even if a company was willing to help, police didn’t have saliva or cheek swabs from potential suspects the companies need to conduct their tests. So investigators would have had to do a lot of genetic legwork to get the DNA data and format it in a way that GEDmatch could recognize.
Colleen Fitzpatrick and Margaret Press have pioneered a way to do just that. The pair cofounded the DNA Doe Project, a nonprofit that uses genetics and genealogy to put a name to remains from unidentified people, including crime victims. The techniques developed for their organization are probably the same ones used in the Golden State Killer case, Fitzpatrick and Press say.
Forensic DNA fingerprints in law enforcement databases are composed of 20 “short tandem repeats.” Those are places in the human genetic instruction book — the genome — where a string of two to six DNA bases, or letters, repeat. For instance, ACGTACGTACGT would be three repeats. People have varying numbers of repeats at these locations. Police have used “familial searches” of law enforcement databases with short tandem repeats to identify suspects in some cases, but that approach has led to wrongful accusations in others.
Short tandem repeats, or STRs, are not the sort of DNA data found in GEDmatch. That database is a repository where people can voluntarily upload raw genetic data generated by consumer testing companies, such as 23andMe, Ancestry, Family Tree and others. So GEDmatch allows people to find relatives who may have used a different company to generate the genetic data.
23andMe and the other companies use saliva samples or inner cheek swabs sent in by customers to test for about 600,000 variations of individual DNA letters, known as SNPs (pronounced “snips”) for single nucleotide polymorphisms. “The statistics you can do on 600,000 SNPs are so much more powerful than statistics you can do on 20 STRs,” Fitzpatrick says. As a result, matches made through SNP testing can help investigators identify distant relatives more easily than short tandem repeats can, she says. It can also define the relationships between matches, showing that two people are first or third cousins, for instance.
To get the data needed for upload to GEDmatch, the DNA Doe Project — and probably the DeAngelo investigators — use a method Fitzpatrick and Press began developing last year. Their team uses special techniques to decipher, or sequence, degraded DNA. For the Doe Project, that can mean exhuming bodies and extracting DNA from bones, teeth or other tissues. Once that DNA has been sequenced, investigators use computer programs to compile a list of the same SNPs used by the consumer testing companies. Then files that mimic the format of 23andMe or Ancestry’s reports are generated and uploaded to GEDmatch.
If relatives of the unidentified deceased person (or, as in this case, a crime suspect) are in the GEDmatch database, investigators will be able to see those matches. Then painstaking genealogy research must begin to establish the person’s identity.
Once Doe project investigators think they have identified the correct person, law enforcement officials contact the missing person’s family and collect DNA samples to verify the identity. On April 10, the project announced that its methods had identified the remains of “Buckskin Girl,” a young woman whose body was discovered in 1981, as those of Marcia Lenore King.
In DeAngelo’s case, police reportedly found a genetic link to DeAngelo’s great-great-grandparents in GEDmatch, then spent months following the branches of his family tree.
Once police settled on him as suspect, they had to do more routine genetic gumshoe work. DNA collected from an item DeAngelo discarded (police haven’t identified the item, but it might be something like a cup, used Kleenex or a bit of leftover food) was used to do short tandem repeat matching to DNA taken from the Golden State Killer crime scenes. That step was necessary because it’s unclear whether the genealogical DNA evidence would be admissible in court, says Katsanis. Confirmation that DeAngelo’s DNA matched crime scene DNA came on April 20, police say. They then collected DNA from another discarded item to verify the first result. On April 23, the second test confirmed the finding. DeAngelo was arrested the next day. So far, he has been charged with eight murders.
Police “followed this genetic lead to try to investigate this case in a novel way,” says Katsanis. “But it took a lot of resources. ... If they’d had the wrong DeAngelo, it would have been a lot of investigation for nothing.” Such extensive genetic detective work is difficult and takes money and manpower most jurisdictions can’t spare, says Katsanis.
Whether that soothes people worried about their DNA being mined by law enforcement — or by anyone else, for that matter — is another issue. Katsanis likens the data used in this case to phone numbers listed in the phone book’s white pages. People in GEDmatch’s database voluntarily placed their DNA information there for anyone, including the police, to see, she says.
Katie Hasson, program director on genetic justice for the Center for Genetics and Society in Berkeley, Calif., disagrees with that comparison. “This kind of information is a little bit different than a phone number, very private.” Many people may not realize all the ways their DNA may be used or what it could mean for their families, she says. “People need to think about the ways making their genetic data public also makes their relatives’ DNA public.”
Press advises people considering genetic testing and uploading to GEDmatch to just assume that their data will be used by companies, law enforcement, foreign governments, marketers and other entities.
As GEDmatch notes on its website: “In today's world, there are real dangers of identity theft, credit fraud, etc. We try to strike a balance between these conflicting realities and the need to share information with other users. In the end, if you require absolute privacy and security, we must ask that you do not upload your data to GEDmatch. If you already have it here, please delete it.”
Says Press: “Assume the worst, and if you can live with that then jump on in and enjoy the benefits and you won’t have nightmares down the road. Anything can happen with anything we put out there, including our Facebook profiles. It’s not that we should worry, but we should be aware.”
Golden State Killer Press Conference 04/25/18. Law & Crime Network, YouTube. Accessed April 27, 2018.