Phrases from Wikipedia pages on hot scientific fields end up in published papers, a study finds
Farknot Architect/shutterstock; Wikimedia Commons
Wikipedia: The settler of dinnertime disputes and the savior of those who cheat on trivia night. Quick, what country has the Nile’s headwaters? What year did Gershwin write “Rhapsody in Blue”? Wikipedia has the answer to all your burning trivia questions — including ones about science.
With hundreds of thousands of scientific entries, Wikipedia offers a quick reference for the molecular formula of Zoloft, who the inventor of the 3-D printer is and the fact that the theory of plate tectonics is only about 100 years old. The website is a gold mine for science fans, science bloggers and scientists alike. But even though scientists use Wikipedia, they don’t tend to admit it. The site rarely ends up in a paper’s citations as the source of, say, the history of the gut-brain axis or the chemical formula for polyvinyl chloride.
But scientists are browsing Wikipedia just like everyone else. A recent analysis found that Wikipedia stays up-to-date on the latest research — and vocabulary from those Wikipedia articles finds its way into scientific papers. The results don’t just reveal the Wiki-habits of the ivory tower. They also show that the free, widely available information source is playing a role in research progress, especially in poorer countries.
Teachers in middle school, high school and college drill it in to their students: Wikipedia is not a citable source. Anyone can edit Wikipedia, and articles can change from day to day — sometimes by as little as a comma, other times being completely rewritten overnight. “[Wikipedia] has a reputation for being untrustworthy,” says Thomas Shafee, a biochemist at La Trobe University in Melbourne, Australia.
But those same teachers — even the college professors — who warn students away from Wikipedia are using the site themselves. “Academics use Wikipedia all the time because we’re human. It’s something everyone is doing,” says Doug Hanley, a macroeconomist at the University of Pittsburgh.
And the site’s unreliable reputation may be unwarranted. Wikipedia is not any less consistent than Encyclopedia Britannica, a 2005 Nature study showed (a conclusion that the encyclopedia itself vehemently objected to). Citing it as a source, however, is still a bridge too far. “It’s not respected like academic resources,” Shafee notes.
Academic science may not respect Wikipedia, but Wikipedia certainly loves science. Of the roughly 5.5 million articles, half a million to a million of them touch on scientific topics. And constant additions from hundreds of thousands of editors mean that entries can be very up to date on the latest scientific literature.
How recently published findings affect Wikipedia is easy to track. They’re cited on Wikipedia, after all. But does the relationship go the other way? Do scientific posts on Wikipedia worm their way into the academic literature, even though they are never cited? Hanley and his colleague Neil Thompson, an innovation scholar at MIT, decided to approach the question on two fronts.
First, they determined the 1.1 million most common scientific words in published articles from the scientific publishing giant Elsevier. Then, Hanley and Thompson examined how often those same words were added to or deleted from Wikipedia over time, and cited in the research literature. The researchers focused on two fields, chemistry and econometrics — a new area that develops statistical tests for economics.
There was a clear connection between the language in scientific papers and the language on Wikipedia. “Some new topic comes up and it gets exciting, it will generate a new Wikipedia page,” Thompson notes. The language on that new page was then connected to later scientific work. After a new entry was published, Hanley and Thompson showed, later scientific papers contained more language similar to the Wikipedia article than to papers in the field published before the new Wikipedia entry. There was a definite association between the language in the Wikipedia article and future scientific papers.
But was Wikipedia itself the source of that language? This part of the study can’t answer that. It only observes words increasing together in two different spaces. It can’t prove that scientists were reading Wikipedia and using it in their work.
So the researchers created new Wikipedia articles from scratch to find out if the language in them affected the scientific literature in return. Hanley and Thompson had graduate students in chemistry and in econometrics write up new Wikipedia articles on topics that weren’t yet on the site. The students wrote 43 chemistry articles and 45 econometrics articles. Then, half of the articles in each set got published to Wikipedia in January 2015, and the other half were held back as controls. The researchers gave the articles three months to percolate through the internet. Then they examined the next six months’ worth of published scientific papers in those fields for specific language used in the published Wikipedia entries, and compared it to the language in the entries that never got published.
In chemistry, at least, the new topics proved popular. Both the published and control Wikipedia page entries had been selected from graduate level topics in chemistry that weren’t yet covered on Wikipedia. They included entries such as the synthesis of hydrastine (the precursor to a drug that stops bleeding). People were interested enough to view the new articles on average 4,400 times per month.
The articles’ words trickled into to the scientific literature. In the six months after publishing, the entries influenced about 1 in 300 words in the newly published papers in that chemical discipline. And scientific papers on a topic covered in Wikipedia became slightly more like the Wikipedia article over time. For example, if chemists wrote about the synthesis of hydrastine — one of the new Wikipedia articles — published scientific papers more often used phrases like “Passarini reaction,” a term used in the Wikipedia entry. But if an article never went on to Wikipedia, the scientific papers published on the topic didn’t become any more similar to the never-published article (which could have happened if the topics were merely getting more popular). Hanley and Thompson published a preprint of their work to the Social Science Research Network on September 26.
Unfortunately, there was no number of Wikipedia articles that could make econometrics happen. “We wanted something on the edge of a discipline,” Thompson says. But it was a little too edgy. The new Wikipedia entries in that field got one-thirtieth of the views that chemistry articles did. Thompson and Hanley couldn’t get enough data from the articles to make any conclusions at all. Better luck next time, econometrics.
The relationship between Wikipedia entries and the scientific literature wasn’t the same in all regions. When Hanley and Thompson broke the published scientific papers down by the gross domestic product of their countries of origin, they found that Wikipedia articles had a stronger effect on the vocabulary in scientific papers published by scientists in countries with weaker economies. “If you think about it, if you’re a relatively rich country, you have access at your institution to a whole list of journals and the underlying scientific literature,” Hanley notes. Institutions in poorer countries, however, may not be able to afford expensive journal subscriptions, so scientists in those countries may rely more heavily on publicly available sources like Wikipedia.
The Wikipedia study is “excellent research design and very solid analysis,” says Heather Ford, who studies digital politics at the University of Leeds in England. “As far as I know, this is the first paper that attributes a strong link between what is on Wikipedia and the development of science.” But, she says, this is only within chemistry. The influence may be different in different fields.
“It’s addressing a question long in people’s minds but difficult to pin down and prove,” says Shafee. It’s a link, but tracking language, he explains, isn’t the same as finding out how ideas and concepts were moving from Wikipedia into the ivory tower. “It’s a real cliché to say more research is needed, but I think in this case it’s probably true.”
Hanley and Thompson would be the first to agree. “I think about this as a first step,” Hanley says. “It’s showing that Wikipedia is not just a passive resource, it also has an effect on the frontiers of knowledge.”
It’s a good reason for scientists get in and edit entries within their expertise, Thompson notes. “This is a big resource for science and I think we need to recognize that,” Thompson says. “There’s value in making sure the science on Wikipedia is as good and complete as possible.” Good scientific entries might not just settle arguments. They might also help science advance. After all, scientists are watching, even if they won’t admit it.