News Math

Baseball by the numbers

Study evaluates success of statistical analyses in determining the player with the golden glove

August 6, 2009 at 4:42 pm - More than 2 years ago

WASHINGTON — Baseball fans know who has the golden glove, but assigning a number to a player’s defensive merits has been tricky. A recent analysis suggests new sophisticated statistical methods could offer a fuller picture for sports nuts, Benjamin Baumer reported August 5 at the Joint Statistical Meetings.

Traditionally, fielding ability has been calculated by dividing a player’s number of errors by the total number of chances the player had to make a play, and subtracting that number from one. (Errors are mistakes that would have been avoided with “ordinary effort” — an arguably subjective call.) But this long-used formula doesn’t give credit to a fielder with range, who can successfully run down a ball out of reach for most players.

Two proposed methods get around this problem, said Baumer, a statistician for the New York Mets and a doctoral student at the City University of New York. The first, called a discrete model, divides the field into zones, and divides hits into categories based on type, direction, handedness of pitcher and other characteristics. The probability of an average player catching a particular kind of ball in a zone is compared with how a player actually played. Many variations of this method exist, some with tweaks that account for variables such as field differences and ball hogging.

Baumer also looked at a second method, proposed by Shane Jensen of the University of Pennsylvania in a paper published in June in the Annals of Applied Statistics. This approach does not divide the field into zones. Instead, it mathematically describes a smooth, continuous playing surface.

The reliability of the different methods was assessed using scores for real players from year to year, based on the assumption that the player would play consistently (meaning the scores should not vary drastically). The methods work, but could still be improved, Baumer found.

For a sample of 4,000 balls in play during Major League Baseball games, a discrete method accounting for ball hogging was more reliable than a discrete method without the correction. Similar analyses found that the continuous method performed well for players in the outfield, but didn’t perform well in the infield.

But Baumer has hopes that the continuous method will improve dramatically with a new data set. Actual data tracking players’ and balls’ locations on the field would replace the estimates that are currently used, he said.

All methods that try to evaluate players face the problem of validation — a method works if it gives high ratings to the good players, but how are those good players identified? “Unfortunately, there’s no gold standard,” Baumer said. “There’s definitely some chicken-and-egg thing going on here.”

Statistical methods should be a part of evaluating a player’s skill, says Matthew Johnson of Teachers College, Columbia University in New York City. “I think it’s definitely true that seeing something with your own eyes is worth something, but it’s also naïve to ignore the numbers.”