Forensics on Trial

Chemical matching of bullets comes under fire

In 1997, a jury convicted Michael Behm of murdering a man in South River, N.J. The only physical evidence linking Behm to the murder was bullet fragments from the crime scene. An FBI examiner testified in court that the fragments chemically matched bullets from a box of ammunition Behm had at his home. “We were devastated by this,” says Jacquie Behm, whose brother is now serving a life sentence for murder. “At the time, we didn’t know anything about bullet-lead analysis.” Nor could her brother’s lawyer during the trial find anyone qualified to question the validity of the chemical evidence or the examiner’s interpretation of it.

LEAD SOURCING. Metallurgist Erik Randich stands in front of 200,000 pounds of lead, an amount that a single pot can cast at a lead smelter. One pot of lead can produce millions of bullets. Sanders Lead Company, Inc.

CAST AWAY. Lead castings (shown here in their molds) from the same pot sometimes differ in their chemical composition. Gopher Resources Company

As it turns out, there should have been plenty to question. Since that trial, a growing body of research has revealed that the practice of chemically matching bullets is seriously flawed. This February, a report released by the National Academies in Washington D.C. called on the FBI to revise its rules on interpreting data from chemical analyses of bullets and to limit how its examiners testify about such data in the courtroom.

Behm’s present lawyer, Paul Casteleiro, has since filed a motion asking the courts to consider the National Academies’ report in deciding whether to grant his client another trial. Other lawyers and their clients are likely to follow suit.

Indeed, the implications are considerable. The FBI has used chemical analysis of bullets in some 2,500 investigations since the early 1980s. Among those, there were 500 cases in which the prosecution introduced such analyses as evidence during trials. But the story of bullet chemical analysis has even broader implications; it emphasizes the need to keep science honest, especially in the courtroom.


For several years, statisticians, metallurgists, and others outside the FBI have questioned the courtroom use of bullet chemical analyses. Now, the National Academies’ report, Forensic Analysis: Weighing Bullet Lead Evidence, has brought the practice into the spotlight. That report, combined with past studies detailing the forensic tool’s shortcomings, could call into question many past convictions in which results of the chemical analysis of bullets was introduced as evidence.

First developed in the 1960s, bullet chemical analysis has been used by prosecutors when a suspect’s weapon is not available or when the bullet found at a crime scene is too fragmented to permit visual inspection of the characteristic markings that firearms leave on intact bullets.

The chemical analysis consists of measuring seven trace elements—arsenic, antimony, tin, copper, bismuth, silver, and cadmium—that typically are present in a bullet’s lead alloy. Each element makes up less than 1 percent of the total lead alloy.

Using a technique called inductively coupled plasma–optical–mission spectroscopy, a forensic chemist determines the proportion of each element in the lead alloy. In that analysis, the chemist dissolves a sample of the bullet and feeds the resulting solution into the instrument’s plasma chamber, where each element in the sample emits specific wavelengths of light. The pattern of emissions serves as a fingerprint for that element, so the intensity of the light of each pattern indicates how much of the element is present.

Characterizing a bullet’s chemical composition is relatively straightforward. What it all means, however, is a matter of interpretation. The traditional reasoning has been that if two bullets are chemically indistinguishable, they probably came from the same pot of molten lead at the smelter or were manufactured on the same day by the same company. In court testimonies, FBI examiners have gone so far as to say that two chemically indistinguishable bullets probably came from the same box of ammunition.

Several years ago, while still working at the FBI, metallurgist William Tobin began questioning this practice. After all, he notes, much was and still is unknown about bullet manufacturing. It is disingenuous to say that the matching of two bullets is a significant find without knowing how much chemical diversity there is in the general population of bullets, Tobin says. “There isn’t an individual on the face of the earth qualified to interpret the forensic significance of bullet-lead analysis,” he argues.

After retiring from the FBI in 2000, Tobin partnered with Erik Randich, a forensics consultant and metallurgist at Lawrence Livermore National Laboratory in California. The duo set out to examine whether there was any statistical basis to bullet-lead matching. The metallurgists contacted two lead smelters that supply ammunition manufacturers in the United States and pored over the smelters’ production data.

These lead suppliers are called secondary smelters because the majority of their lead comes from spent automotive batteries rather than from ore. Most of the recycled lead goes back into making new batteries, so the refiners adjust trace elements in the lead to meet the specifications of the battery industry. Smelters keep detailed records on the elemental composition of the molten lead in each pot.

When Tobin and Randich looked at the composition records for different pots of molten lead, they saw reason for concern. The composition of castings from a single pot sometimes varied, while the composition of lead in different pots sometimes matched. That meant that bullets made from two different batches of lead could wrongly appear to have come from the same pot.

“We then knew that both of the assumptions that the FBI makes—that a lead source is homogeneous and unique—are not true,” says Randich.

It’s circumstantial

With the publication of Tobin and Randich’s research in July 2002, as well as other studies including the FBI’s own analyses, pressure mounted on the FBI to reevaluate its methods and court testimonies. In the fall of that year, the bureau asked the National Academies to put together a committee to formally review the FBI’s use of bullet-lead analysis and recommend changes.

The uncertainty of lead’s provenance doesn’t end with the smelting process, says Kenneth MacFadden, an independent consultant with training as an analytical chemist, who chaired the National Academies’ committee. “Bullets from one [lead source] can get mixed with bullets from another at various points in the manufacturing process,” he explains.

Once a bullet manufacturer receives slabs of lead from a refiner, the lead is cut into smaller blocks called billets. The billets are extruded into wires, which are cut into slugs that are then pressed into bullets. Because manufacturers receive lead from different smelter pots, lead from different sources can be intermingled at many stages in the manufacturing process. Therefore, a box of ammunition is likely to contain bullets from multiple volumes of lead, the committee reported.

“In fact, the FBI’s own research has found instances where a single box of ammunition contained as many as 14 distinct compositional groups,” MacFadden says.

The committee concluded that it’s impossible to determine that a bullet from a crime scene came from a particular box of bullets or that two bullets were manufactured on the same day at the same factory.

This finding greatly weakens the evidentiary value of bullet-lead analysis. The committee recommended setting narrow limits on what FBI examiners can say in court. For instance, should two bullets have matching compositions, instead of suggesting they came from the same box of ammunition, an FBI expert can merely testify to an increased probability that the two bullets came from what the committee has called the same “compositionally indistinguishable volume of lead” (CIVL).

Acknowledging that smelting pots come in different sizes and that the chemical makeup of lead can vary within a pot, the committee asked that FBI examiners avoid making reference to “melt” or “production run.” MacFadden adds that experts should explain to jurors that a CIVL can be of different sizes and produce anywhere from 12,000 to 35 million .22-caliber bullets. Annually, 9 billion bullets are made in the United States.

Committee member Paul Giannelli of Case Western Reserve University’s Law School in Cleveland, likens attempts to match the lead from different bullets to finding a Nike size-10 shoeprint that matches that of a suspect’s size-10 Nikes. “It’s only circumstantial evidence,” says Giannelli. “It would be admissible in court, although there would be a large number of people with that type of shoe.” Similarly, if forensic analysis showed that the composition of a bullet from a crime scene matched that of a bullet confiscated from a defendant, there still would be many other people in possession of matching bullets.

Some say the shoe analogy isn’t appropriate because the public is familiar with the distribution of footwear sizes. “Jurors are perfectly equipped to assess the probative value of a Nike size-10 shoeprint,” says Tobin. But it’s impossible for them to do that with bullet-lead analysis, he says, because they know too little about the origin, processing, and distribution of bullets.

Because of these uncertainties, trying to determine the odds that two bullets will match by sheer coincidence rather than shared origin is difficult. Several years ago, statistician Alicia Carriquiry of Iowa State University in Ames came up with a mathematical model to calculate this false-positive rate.

Funded by the FBI through the Department of Energy’s Ames Laboratory, she and her colleagues took bullet-composition data from the FBI’s database and plugged the numbers into the model. The model came up with a false-positive rate as high as 27 percent. In contrast, the false-positive rate for DNA fingerprinting is one in a quadrillion.

However, Carriquiry says that the 27 percent false-positive rate is not informative. The absence of several types of data muddied her team’s efforts to pinpoint the true odds of two bullets matching merely by chance. For example, they needed to know the chemical diversity in the overall population of bullets and whether bullets from one batch of lead get shipped to a single town or dispersed across the country.

“Our conclusion was that you could calculate false positives this way, but we didn’t have enough information to do it,” says Carriquiry. In other words, there is currently no solid way of quantifying the evidentiary strength of a chemical match between two bullets, she says.

My bullet, your bullet

To start filling in the knowledge gap, Tobin has begun preliminary studies of the retail distribution of bullets. Because bullet manufacturers will not reveal who their clients are and how many boxes they ship to particular stores, Tobin decided to pay a visit to his local Wal-Mart, which is in Fredericksburg, Va. The company is one of the top two retailers of .22-caliber bullets in the United States; Kmart is the other.

Once boxes of ammunition leave the manufacturer’s warehouse, they tend to travel on pallets. All boxes on a pallet have the same packing code. When Tobin did a preliminary analysis of the ammunition boxes at the Fredericksburg Wal-Mart, he deduced that potentially hundreds of residents in the area over several months had purchased bullets with the same packing code, indicating similar time of manufacture—and thus similar chemistry, according to the FBI.

“The study that really blew me away was the one I conducted in Juneau, Alaska,” says Tobin. Although Juneau has an outdoors-oriented citizenry, it has only three retail outlets for bullets. Tobin and his colleagues examined the packing codes of every box of bullets in each store. They tallied several brands.

The team then calculated the chance that in Juneau an “innocent” purchaser of a specific brand of bullet would buy bullets with the same packing code as a “suspect’s” bullet. For each brand, the chances ranged anywhere from 87 percent to 100 percent. By bullet-lead analysis alone, therefore, most of the bullet buyers would be suspects.

Tobin says that the more data he gets his hands on, the less confidence he has that bullet-lead analysis has any value at all.

A group of forensic scientists at the University of Southern California in Los Angeles recently teamed up with Tobin to continue his distribution studies. The researchers are sending students out to stores to record the packing codes on bullet boxes.

Warning shots

Bullet-lead analysis isn’t the only forensic technique to come under fire in recent memory. Courts and legal experts have begun questioning tool-mark analysis—say, the pry-bar markings on a doorframe; handwriting analysis; and even fingerprint analysis. David Faigman at the University of California, San Francisco’s Hastings College of Law says the problem is that many forensic techniques have been used for decades without undergoing significant validity testing. Only recently have experts and legal authorities begun to realize this oversight, he says.

Faigman calls the National Academies’ report on bullet-lead analysis an “exemplary handling of the subject.” In fact, he would like to see all disputed forensics sciences, as well as psychological evaluations such as repressed memories and battered-woman’s syndrome, get this kind of critical assessment.

The time might be right for such reviews. With the relatively recent introduction of DNA evidence—the gold standard among forensic tools—jurors, judges, and lawyers are becoming more adept at asking technical questions regarding false-positive rates or validation studies, says Carriquiry.

That was the case in 2002, when a federal judge in Philadelphia held that fingerprint experts couldn’t testify that a partial fingerprint from a crime scene matched the defendant’s print. The practice of matching partial fingerprints with those of a suspect has a long history, yet the judge found that its validity had never been tested in any meaningful way. Although the judge later reversed his ruling, the development highlighted the need to hold forensics sciences to the same, high standards required in other areas of science.

“There are a lot of techniques out there that could be reviewed,” says Moses Schanfield, chair of the department of forensic sciences at George Washington University in Washington, D.C. But that would require collecting vast amounts of data. And that takes time. The FBI is currently going through the long, arduous process of collecting and analyzing handwriting samples, he says.

Thorough empirical studies of forensic techniques could also yield valuable new information about certain types of evidence. For instance, Faigman says, it’s difficult for an examiner to discern the age of a fingerprint discovered at the scene of a crime. If researchers understood how fingerprints fade or deteriorate over time, he says, then a fingerprint might place a suspect at the crime scene on a specific day.

Every day, courts are forced to make tough decisions using scientific evidence that inevitably comes with a degree of uncertainty. “So, we ought to have the best data in order to make the best decisions,” says Faigman. “It may be that we’re wrong on some of these things, but we’re certainly going to be wrong a lot more often if we don’t base our decisions on the best data available.”

More Stories from Science News on Materials Science