Schuessler, Jennifer. “Regional English, Tweet by Tweet.” New York Times. The New York Times: 2 March 2012. Web. 9 March 2012.
Regional English, Tweet by TweetBy JENNIFER SCHUESSLER
The Dictionary of American Regional English, the recently completed landmark project we profiled recently, is based largely on research by a team of fieldworkers who fanned out across the country some 50 years ago in vans called Word Wagons, querying Americans about their ways of talking about kitchen implements, farm animals, bodily ailments, misbehaving children, stupid neighbors and more.
The linguists of the future, however, may not have to go to such literal lengths to find geographical variations in speech. According to a paper delivered at the annual meeting of the American Dialect Society in January by Brice Russ, a graduate student at Ohio State University, the 200 million or so messages posted each day in the supposedly placeless world of Twitter may end up being a rich source of information about regional difference.
So far, the sheer number of words on the site has proved daunting. “You see computational linguists using Twitter to predict box office revenues and opinion polls or to do sentiment analysis,” Mr. Russ said in a telephone interview. “But when it comes to answering the more traditional kinds of questions linguists look at, people are still trying to figure out what to do with so much data.”
To demonstrate the validity of Twitter-based research, Mr. Russ searched through some 400,000 Twitter posts coming from identifiable locations and zeroed in on three different linguistic variables, starting with the regional distribution of “soda” vs. “pop” or “Coke,” something that has been well-studied by scholars and amateurs alike. Next, he tracked the use of “hella,” an intensifier (as in “hella boring”) that is associated with Northern California but whose regional distribution has only been examined anecdotally. Finally, he looked at the well-documented syntactic construction “needs X-ed” (as in “the car needs washed”), which is common in the Midwest and especially around Pittsburgh.
Mr. Russ’s results for carbonated beverages, plotted onto a Google map, track closely with previous research, with “pop” predominant from the Midwest to the Pacific Northwest, “Coke” predominant in the South and “soda” ruling the Northeast and Southwest while also cropping up elsewhere. But his map for “hella” shows the word leap-frogging up the West Coast to Seattle (and, more puzzlingly, popping up in St. Louis and Kansas City). “People may be moving up the coast, bringing it with them,” he said, adding that he was utterly confounded by the midwestern “hella” hotspots.
As for the “needs X-ed” construction, Mr. Russ detected hints of a southward drift since it was studied in the mid-1990s, though he was cautious about drawing firm conclusions. “There could have been diffusion southward,” he said. “Or I may have just caught something that the previous research missed.”
Mr. Russ isn’t the first to use digital media to study regional language variation. At last year’s meeting of the Linguistic Society of America, a team from Carnegie Mellon University presented a paper showing that a Twitter user’s location could be predicted to within 300 miles by tweets alone. The Lexicalist, a project begun in 2010, has analyzed millions of words on the Internet to map usage patterns according to various demographic factors. And a dialect survey led by Bert Vaux, formerly of Harvard, used online surveys to map different names for sneakers (or “gym shoes” in Illinois, where this writer comes from), varying pronunciations of “aunt” or “crayon,” and answers to questions like “What do you call a drive-through liquor store?” (Most popular answer: “I have never heard of such a thing.”)
Twitter, whose users skew younger, more urban and less white than Internet users in general, does provide some methodological challenges to researchers, Mr. Russ acknowledged. (Among other things, he noted, it only shows where users are now, not where they are from originally.) But it may allow them to track linguistic patterns on a vast scale and in something close to real time, identifying phenomena that can then be investigated more deeply by traditional fieldwork.
Mr. Russ emphasized that Twitter-mining is unlikely to replace the kind of old-fashioned, on-the-ground word hunting that produced the Dictionary of American Regional English.
“The ‘bobbasheelys’ and ‘crawdads’ of English don’t always show up on Twitter often enough to be mapped on a large scale,” he said, referring to two of the dictionary’s classic entries.