Mining Dracula with Data Science

If you’ve ever woken up from that nightmare where you’re giving a presentation but you’ve never read the book, then you’ll know just how stressful that can be, unless of course, you’re Tyler Diehl. Tyler, a junior Finance/Statistics double major, stood in front of a graduate class last month and presented his analysis of Bram Stoker’s Dracula without ever turning a single page. He coolly explained persistent themes throughout the book, styles of writing by the various characters, and even identified the saddest sentence in the whole novel. How did he do that you might ask, well Tyler had a secret weapon: data science.

You see, Tyler has been working this semester as part of the Research STAR¹ program in the Office of Information Technology’s Research and Data Science Services group. This group works with researchers to help take their research to the next level with technology, and Tyler was doing just that. He had been tasked with developing a workshop on using the programming language R to do text mining, the process of extracting information from text using data science techniques. In working with his mentor, he settled on Dracula due to the interesting nature of how the book was written, an epistolary novel with several distinct internal authors, and that it was in the public domain, making the text readily available². He then wrote an analysis in R that processed the text and enabled him to break it down into something that could be analyzed using the statistical methods he was familiar with from his classes. From there, he was able to use tools like sentiment analysis, topic modeling, and word clouds to extract relevant information about the text. He then found himself standing in front of a dozen graduate students in the Monsters in Myth, Literature, and Video Games class shedding interesting insights about a Gothic classic without so much as cracking the spine.

Tyler Diehl presentation — Tyler Diehl presented his findings at the *Monsters in Myth, Literature, and Video Games* class.

Tyler has since started to read Dracula to get a better understanding of his findings and take his analysis to the next level, but as he pointed out to the class, he had never text mined before the semester; he just had a working knowledge of R and a few hours a week dedicated to it.

If you are interested in trying out text mining or have an idea for a research project related to data science, high-performance computing, artificial intelligence, or the internet of things, please reach out to the OIT Research and Data Science group at help@smu.edu. If you are a student looking to work on cool projects and want to join the STAR program, we’re hiring, so reach out to Dr. Eric Godat at egodat@smu.edu.

¹ Student Technology Assistant in Residence: https://stars.smu.edu
² Courtesy of Project Gutenberg: https://www.gutenberg.org

Mining Dracula with Data Science

Related

Published by

Eric Godat

Share this:

Related

Published by

Eric Godat