Striking Gold By Digging Into The Data Lode
Suppose an energy company wants to build a plant to produce a biofuel using a new hybrid grass. Where would be the best locations in the United States for the new facility?
That was the quandary posed to economics graduate students Michael Fulmer, Steven Gregory and Jingjing Ye in the 2009 SAS Data Mining Shootout. Using extensive U.S. county crop yield data collected over several years, as well as information on variables within the counties, such as weather and soil characteristics, the team developed methodologies to pinpoint successful plant locations for the fictitious Energy Grass company.
First-place team: winners of the 2009 SAS Data Mining Shootout (from left) Steven Gregory, Jingjing Ye and Michael Fulmer with their faculty sponsor, Tom Fomby, chair of the Department of Economics in Dedman College.
Picking the right places for the biofuel plants might seem a bit like finding needles in haystacks. That’s where data mining comes into play. Data mining, which also is known as business analytics, is the process of extracting useful information from lodes of data by detecting patterns.
The team’s final report narrowed the possibilities to the three states and three counties that would be the most propitious for the biofuel plants. The mathematical models they developed acted as “magnets,” allowing them to pull out those favorable locations from the volumes of data analyzed. An effective model will result in a valid forecast when new data are plugged in.
“Anytime you start from scratch and build a model, it’s a challenge, but it’s fun,” says Gregory, who works in data mining for Mary Kay Inc. while pursuing a Master’s in economics at SMU.
For the second consecutive year, an SMU team won the prestigious national contest.
“We usually work independently, so this was a good opportunity to work as a group and share ideas,” says Fulmer, who is pursuing a Ph.D. in economics.
In an age when the facts attached to virtually every step in a business transaction are captured, “companies are being overrun with data and require techniques that enable them to make the information useful,” says Economics Department Chair Tom Fomby, who served as faculty sponsor for the two winning teams.
While data-mining tools often are associated with business applications – in everything from making product suggestions to retail customers to detecting credit card fraud – they’re also important in data-heavy fields like science, engineering and defense.
“Companies are being overrun with data and require techniques that enable them to make the information useful.”
They analyzed 36 years’ worth of hospital data on appendicitis, influenza and gastric viral infections and uncovered a tracking pattern that suggests a relationship between a flu-like virus and appendicitis.
According to Edward Livingston, the physician who led the study, the findings could prompt the medical community to re-evaluate the need for emergency surgery in cases of nonperforated appendicitis.
The results of the SMU professors’ collaboration with researchers from UT Southwestern Medical Center in Dallas and the VA Medical Center in Gainesville, Florida, appeared in the article “Association of Viral Infection and Appendicitis.” The research was featured in USA Today, Bloomberg Businessweek and a number of national science and research news sites.
With the supply of those skilled in data-mining practices outpacing the demand across disciplines for analysts, Fomby’s “Data Mining Techniques for Economists” course is filled to capacity with seniors and Master’s students, along with a few Ph.D. students.
“The first time it was offered in 2004, we had six students. Now we have 30,” Fomby says. “At most universities, data mining is offered through information technology or business. It’s a fairly rare offering for an economics department.”
&ndash Patricia Ward