Scooping the political pollsters

Who will win the election in November? A technique from baseball stats may predict the answer.

Nate Silver was bored. He’d graduated from the University of Chicago in economics and gone on to a typical consulting job, but it didn’t interest him much. Not as much as baseball, that’s for sure.

The job came with one nice perk, though: access to a cool, geeky statistics software package. It was just the thing for analyzing baseball data. Before long, Silver could use it to predict how good a baseball player’s season would be — and he could do it better than anyone else.

Silver’s method catapulted him into a new career as a hotshot baseball analyst. But his tendency to noodle around with side interests didn’t stop. He tackled a new game, politics. The result? Once again, he bettered all the old-timers.

He’d been tracking politics for a while, and questions kept popping up for him. Did Clinton really appeal more strongly to poorer voters? Did Obama have an advantage in caucus states as the pundits said, and if so, by how much? And most importantly, who was going to win? Numbers, Silver figured, could help find the answers.

He used the same techniques he’d been so successful with in baseball, and he often found a different story from the one the media was telling. His main tool was a standard statistical workhorse called multiple regression analysis, which allowed him to tease out which factors were most strongly influencing an outcome. Obama was doing better among rich folks? Not quite, Silver said. He was really doing better among more educated people, who also happen to have more money.

He decided to share his observations, but with caution. “Sports and politics are strange bedfellows,” he says. He used the pseudonym “Poblano” (“Hey, I do like Mexican food”), posting his observations at the liberal website DailyKos.com. Soon, thousands were reading his posts every day.

What he was really after were predictions. The media often reports the results of the latest poll as if it alone offers the best information about what’s going to happen. But public opinion doesn’t change all that fast, Silver reasoned. It made much more sense, he thought, to combine the results of all those polls.

Another website, RealClearPolitics.com, had pioneered this approach, but Silver figured he could do it better. Some polls are simply better than others, he noted, so he counted those from companies who had been more accurate in the past more heavily. Then he started applying tricks from baseball.

He creates his baseball predictions by matching current players with similar past players. Similarly, for each congressional district, he found another district that had already voted whose voters had similar education levels, income, racial mix, religious makeup, etc. Odds are, he figures, the votes in the two districts will be pretty close. He combines this demographic analysis with the polling data to get his best guess.

On May 6, his methods were put to the test. The latest polls said Clinton had drawn close to Obama in North Carolina. But Silver’s model didn’t buy it. In South Carolina and Virginia, Obama had done much better than the polls predicted. When the model matched each district in North Carolina with a similar one in South Carolina or Virginia, it calculated that Obama would clean up with a 17-point win.

Pollsters jeered. On Slate.com, Mickey Kaus said the prediction was made “by a blogger using a sophisticated model that ignores … what’s been happening in the campaign. Like Rev. Wright. I predict this person is wrong!”

But when the results were in, Obama won by 14 points. Silver beat every established pollster. And Poblano became a sensation, an instant election authority. A few weeks later, he decided it was time to come out with his real identity. “You can’t get quoted in The Wall Street Journal as a chili pepper,” he says.

Pollsters have been dissecting his success ever since. “I was completely intrigued by what he did in the primaries,” says Mark Blumenthal of Pollster.com. Blumenthal points out that Clinton and Obama each did well with particular demographic groups that were remarkably consistent over the course of the primary. This was the reason that Silver’s model was able to do so much better than the polls.

He didn’t manage to nail every primary. For example, South Dakota is demographically very similar to North Dakota (where Obama had won by 15 points). Late polls, however, showed Clinton a whopping 26 points ahead. Silver bet on a narrow Obama victory, but Clinton’s heavy campaigning in South Dakota paid off with a nine-point win — about halfway between the polls and Silver’s projection.

Now that the focus has turned to the general election, Silver has had to modify his methods. Without the state-by-state rollout of the primaries, he’s turned to the results of past presidential elections to supply data for his demographic analysis.

So who’s going to win? If the election were held today, Silver’s model says that Obama would win by five points (good news for Silver, an Obama fan). Silver’s best guess is that come November, Obama will still win, but not so handily, since big leads early on almost always narrow with time. His model accounts for that and projects a three-point win.

Andrew Gelman, a statistician and political scientist at ColumbiaUniversity, says that he likes Silver’s analyses very much, but the irony is that Silver’s demographic approach may be too powerful for its own good. Silver is combining a demographic analysis (which changes only slowly) with poll results (which come out every week). Gelman believes that this early in an election, the demographic analysis alone is far more telling than polls could ever be. “The logic of the situation would push him toward not using the polls much at all until a month before the election,” Gelman says. “But then he wouldn’t have much news to report every week for his blog.”

More Stories from Science News on Math