Statistical tests suggestive of fraud in Iran’s election

A closer look at voter ballot data reveals suspicious anomalies

July 10, 2009 at 12:33 pm - More than 2 years ago

An American statistician says strong statistical evidence backs up the claims of Iranian protestors that Mahmoud Ahmadinejad’s victory in the June election was fraudulent.

Walter Mebane of the University of Michigan in Ann Arbor analyzed Iranian election data and found anomalies strongly suggesting that ballot boxes were stuffed with extra votes for Ahmadinejad. Mebane also identified 81 towns where further investigations are likely to find evidence of fraud.

“This suggests that the actual outcome should have been pretty close,” says Mebane, who described his analysis on a paper posted on his website June 15 and updated June 29. The official results showed Ahmadinejad getting almost twice as many votes as his closest rival.

“His data is highly, highly, highly suggestive that something odd was going on,” says political scientist Henry Brady of the University of California, Berkeley. “Someone who really knows the geopolitical makeup of Iran might be able to take this analysis further. I hope the CIA has someone doing that.”

Mebane cautions that the anomalous statistics could imaginably have an innocent explanation, that limited data is available, and that he is not himself an expert on Iranian politics. Nevertheless, he concludes that “because the evidence is so strikingly suspicious, the credibility of the election is in question until it can be demonstrated that there are benign explanations for these patterns.”

After receiving vote counts from each polling station, Mebane examined them for internal consistency using a statistical curiosity known as Benford’s Law. In many kinds of data, the first digit of the numbers will be 1 about 30 percent of the time, rather than following the naïve expectation of one time out of 10. Exponential growth is one way of producing this pattern: If a bacterium population starts at 100 and doubles each day, then for the entire first day there will be 100-some bacteria, giving an initial digit of 1. By comparison, the first digit will be 7 for just a few hours as the population zooms from 400 to 800 on the third day. Benford’s Law also applies when many random processes combine to produce the data.

Mebane has studied election data from many countries, including the United States, Russia and Mexico. In 2006, he found that vote counts tend to follow Benford’s Law in the second digit. That finding was initially controversial but is now widely accepted.

When Mebane studied polling station-level data from Iran, he found that the numbers on the ballots for Ahmadinejad and two of the minor candidates didn’t conform to Benford’s Law well at all.

In any fair election, a certain percentage of votes are illegible or otherwise problematic and have to be discarded. When people commit fraud by adding extra votes, they often forget to add invalid ones. Suspiciously, Mebane found that in towns with few invalid votes, Ahmadinejad’s ballot numbers were further off from Benford’s Law — and furthermore, that Ahmadinejad got a greater percentage of the votes.

“The natural interpretation is that they had some ballot boxes and they added a whole bunch of votes for Ahmadinejad,” Mebane says.

Mebane also received data from the 2005 Iran election that aggregated the votes of entire towns. He compared it with the 2009 data to see how plausible the patterns were, using a method similar to the one he used to analyze the “butterfly ballots” in Florida in the 2000 U.S. presidential election. If Ahmadinejad fared poorly in a particular town in 2005, you wouldn’t expect him to do especially well there in 2009 either. Mebane used a statistical model for finding the most likely relationship between the two results. To do so, his method ignores “outliers,” data points that don’t fit well with that most likely relationship.

The best relationship the model found produced 81 outliers out of 320 towns in the analysis, a strikingly high percentage. Another 91 fit the model, but poorly. In the majority of these 172 towns, Ahmadinejad did better than the model would have predicted.

“This is not necessarily diagnostic of fraud,” Mebane says. “It could just be that the model is really terrible.” But since the first analysis gives evidence of fraud, the cities the model flags as problematic are the sensible ones to scrutinize.