Skip to Content, Navigation, or Footer.

AI helps to reveal global differences on Wikipedia, historian finds

Jo Guldi presented research at ASU on finding cultural differences in history using Wikipedia data

0fe8709d-048a-426b-80ba-fe3a90d724a2.sized-1000x1000.jpg

"Using AI translation with strict rules to keep exact dates, she said, allows researchers to compare cultures fairly, even when some languages have only a handful of historical articles." Illustration by: 


Wikipedia exists in 357 language editions, but research through machine learning shows that not every one tells the same story.

Jo Guldi, a professor, historian, writer and data scientist who teaches at Emory University, spoke at ASU on Tuesday to present a new research project that uses Wikipedia data to compare how different cultures remember history. The project used machine learning to map which time periods and events dominate among cultures, and how history shapes identity and political worldviews. 

"It's interesting to think about the possibility of analyzing something like essentially all the words on Wikipedia," said Jason Bruner, a professor at the School of Historical, Philosophical and Religious Studies. "(It) is sort of beyond any human capacity to do, and what becomes apparent when you have some kind of capacity to visually represent something like the billions of words on Wikipedia."

Guldi said Wikipedia's size shows why AI tools matter. Within the category of histories of ideology, there are over 3 million articles — far more than any scholar could read by hand. Using AI translation with strict rules to keep exact dates, she said, allows researchers to compare cultures fairly, even when some languages have only a handful of historical articles.

Guldi said in the early years of text mining, researchers mainly used computers to count words in order to spot trends in history and culture.

"The first way that we pursued it was to ask the question, 'Can we discover something new about the history by just counting words over time?'" Guldi said.

Guldi shared work by researcher Dan Cohen, who analyzed word patterns tied to Victorian values such as religion, science and industry from 1790 to 1910. She said the study helped scholars better understand what defined the Victorian era. 

She also referenced psychologist Jean Twenge, whose research found fewer references to community and more references to the individual — a trend that contributed to defining what she called the "narcissism epidemic."

Even with those breakthroughs in text analysis, Guldi said counting words alone can only go so far.

While researchers can measure sentiment or cultural values, she said the real understanding comes from looking at how often societies recall the same events and where their histories differ, which she called "the problem of memory." 

This problem remained at the core of her project, as the machine learning identified differences in cultural and historical recollections through Wikipedia articles' phrases and words.

"This is slightly different than just counting words," Guldi said. "We're counting the elements of memory."

Guldi also showed a visualization of the "History of aviation" article on Wikipedia, tracking mentions of dates from 1850 to 1950 across 31 languages. She said that early auto-translation filled the text with English wording, but once individual cultures shaped their own versions, major differences appeared. 

"If you're writing in Japanese or Vietnamese or Thai about the invention of air travel, you're much more interested in the period after 1950," Guldi said.

Ron Broglio, director of the Humanities Institute, said Guldi's perspective highlights the value of humanities in data science.

"What's really great is she brings a humanities perspective to computer science questions and big data, and it reveals something about differences in cultures and how different cultures see the same event," Broglio said.

Edited by Sophia Braccio, George Headley and Pippa Fung.


Reach the reporter setoro@asu.edu

Like The State Press on Facebook and follow @statepress on X.


Continue supporting student journalism and donate to The State Press today.




×

Notice

This website uses cookies to make your experience better and easier. By using this website you consent to our use of cookies. For more information, please see our Cookie Policy.