What can hot chocolate and the temperature in Los Angeles tell us about how Americans perceive the weather? Quite a bit, if you employ the resources of data science—a blend of math, computer programming, data collection and digital storytelling that enables its practitioners to make precise connections among disparate concepts.
Data science is an emerging academic discipline, with data scientists called “the magicians of the Big Data era” by The New York Times. Think Nate Silver, the rock star statistician whose data-based political forecasting was a must-read in the Times during the 2012 presidential campaign. Last fall, Smith became one of a handful of colleges to offer an undergraduate data science course. The innovative, hands-on course, offered in the Department of Mathematics and Statistics, is taught by visiting assistant professor Benjamin Baumer, a former statistician for the New York Mets.
As an example of data science at work, three of Baumer’s students monitored Twitter for the weather-inspired phrases “hot chocolate” and “electric blanket” in nine American cities and compared quantifiable weather events to the frequency with which people tweeted them. That is, they sought patterns across seemingly unrelated data sets in ways designed to extract and communicate meaning. Math major Sara Stoudt ’15 called it “wrestling with data.”
That is the essence of this new field, Baumer explains. Interest in data science is partly a response to the data piling up as more of our proclivities, actions, habits and even thoughts are collected and translated into numbers. Data science, then, harnesses a digital tidal wave of data to tell a precise and compelling story, often with visually arresting graphics.
“Data science is not a computer design course; it’s not a graphic design course; it’s not a conventional statistics course,” Baumer said. “It’s trying to create students who have some knowledge of all of these elements and who can put these things together.”
Baumer, an economics major with a Ph.D. in mathematics, virtually grew up on the Smith campus, where his father, Donald Baumer, is a longtime government professor. Their friend, and now Benjamin Baumer’s co-author, sports economist and professor Andrew Zimbalist, introduced Baumer to the Mets. For eight years, Baumer’s job was to analyze metrics related to pitching, hitting and fielding to help the front office make complex player-development decisions. Early on, he realized that understanding the data didn’t count for much unless he could communicate it to non-statisticians.
Data science is an unusual offering for undergraduates, Baumer says. To make the subject more concrete, he organized a “Data Expo,” and invited guest speakers, like Becky Sweger, director of data and technology for the Northampton-based National Priorities Project. She uses the tools of data science—in this case, the national budget and zip codes tied to tax receipts—to illustrate how much of your local taxes went to pay for wars last year, and also how many Head Start slots or teacher positions could have been purchased with that money. “We are trying to understand the story data tells by cracking it open,” Sweger said.
Baumer’s students learn about networks (interconnected nodes), data mining (looking for patterns in masses of numbers) and data querying (asking masses of numbers to spit out answers to specific questions). Their arsenal includes sophisticated data management and analysis as well as programming and visualization technologies.
For her weather and words project, Sara Stoudt and her team analyzed the time-stamped frequency that words like “chilly,” “cold” or “hot chocolate” popped up on Twitter in Los Angeles, when the temperature was in the 60s, compared to Minneapolis, when it was in the 20s. Their project showed how data contained in information streams can yield an understanding of human perceptions. “In places that are generally colder we saw fewer tweets about it being cold, but when it gets cold in a place that is generally warm that’s more like a surprise event,” Stoudt said. “L.A. was tweeting about cold very, very frequently, and Minneapolis was focused on other things.”
As someone who wants “to do math in the real world,” Stoudt sees data science as an important new field. “We have so much data now that working with it is a completely different skill.”
It’s a skill that may yield career results. The McKinsey Global Institute, a management consulting group, reports the U.S. will need data scientists to fill half a million jobs in five years. “Any time you see a job title that has the word ‘analyst’ in it, this is going to be somebody prepared to do the things we learned in this class,” Baumer said. “There is a general sense that data is important to making better decisions.”
This story appears in the Spring ’14 SAQ