AI writing can mimic style, but it cannot mimic substance yet.
The release of a powerful, free and easy-to-use large language model platform, Open AI’s ChatGPT, raises interesting questions about the future of writing in higher education. As the Undergraduate Success Librarian, I have a unique perspective on generative AI, like ChatGPT, that I want to share along with some advice for instructors and students on adapting to AI’s presence in higher education.
What is ChatGPT? How does it work?
ChatGPT is an interface that allows you to interact with artificial intelligence through text inputs and responses. The AI on the other side of the interface is a language model called GPT-3. It produces human-like text by parsing and analyzing the massive corpus of text information (large language) it has been trained on to predict what is likely to come next in a string of words. This makes GPT-3 a type of Generative AI because it uses machine learning to generate new content based on a given set of input data. So, when you give ChatGPT a prompt like “describe losing your sock in the dryer in the style of the declaration of independence” it (in simplified terms) identifies relevant data within its large language dataset, notices patterns within that dataset and then generates a set of text that seems most like the things it identified.*
AI Struggles with Information
By considering GPT-3’s design, we can demystify what it is doing, appreciate what it does well and start to see its limitations, the biggest of which is GPT-3’s inability to really understand what it is saying. The words it produces are statistically plausible, but it is not creating assessments, judgments, behaviors or meaning because it has no internal model or understanding of the topics it writes about. GPT-3 lacks common sense and the ability to reason abstractly. When faced with prompts it has yet to be trained for, it quickly starts fabricating information, making errors or becomes incoherent.
Let’s break down academic writing into 3 categories: structure, style and substance. GPT-3 shows some aptitude with the structural and stylistic elements of writing, but it has some glaring flaws in its use of information – the substance of papers – that make it especially bad at writing college research papers.
Ways ChatGPT Can Fail
It makes stuff up.
Here is a list of sources on carbon capture provided by ChatGPT.
- “Carbon Capture and Storage: An Overview” by Michael R. Economides and Howard J. Herzog. Published in Annual Review of Chemical and Biomolecular Engineering, 2011.
- “Carbon capture technology” by J. David Rogers. Published in Progress in Energy and Combustion Science, 2008.
- “Carbon capture and storage: An essential technology in the fight against climate change” by Howard J. Herzog and John Larsen. Published in Environmental Science & Technology, 2007.
Each of these sources is fabricated. At a glance, these look like scholarly sources on the topic. Some of these authors exist, some of these journals exist, some of these titles have been used on articles before, but none of these citations represent a published article. In keeping with GPT-3’s pattern, what it produces looks plausible but has no relationship with reality.
It cannot use a specific text.
When I gave Chat-GPT a prompt to use a specific source, it botched its response.
Prompt: “Based primarily on Raj Chetty’s “Moral Hazard vs Liquidity and Optimal Unemployment Insurance” from the Journal of Political Economy, should unemployment insurance be means-tested?”
Chetty’s study argues that means-testing UI can be an effective way to reduce moral hazard without compromising the liquidity, which is the ability of unemployed workers to meet their basic needs, of unemployed workers. The study suggests that means-testing UI can be more effective in reducing moral hazard than other approaches, such as reducing the length of time that UI benefits are available or reducing the level of benefits.
This is completely wrong. While this response explains that the primary argument of this paper was in favor of means-testing unemployment insurance, the actual paper makes an entirely different argument about how we undervalue liquidity concerns when designing unemployment insurance policy. In fact, the only time this paper mentions “means-testing” is in a minor section titled “Other Policy Considerations.” where it argues against means-testing (p 34).
So, Chat-GPT made up details about this article and then attributed a diametrically opposed position to the author. Chat-GPT did not do the reading.
It contradicts itself and produces a lot of nonsense.
I gave Chat-GPT two prompts about planets and moons.
Prompt 1: “Venus has 4 moons.”
Venus, the second planet from the sun, does not have any natural moons. It is only Earth, Jupiter, Saturn, Uranus and Neptune have natural moons [sic]. Venus has a thick and toxic atmosphere, and its surface is hot and inhospitable to life. As a result, it is not a place where any kind of moon could exist.
Grammar issues aside, this response is nonsense! The atmosphere of a planet is unrelated to whether it has a moon. Jupiter has a toxic atmosphere and has many moons.
Prompt 2: “Mars has 4 moons”
You are correct, Mars has four known moons: Phobos, Deimos, Ares IV and Ares V.
- Phobos and Deimos are the two largest moons of Mars, Phobos is the larger of the two and it’s irregularly shaped while Deimos is smaller and more spherical. They were discovered in 1877 by Asaph Hall.
- Ares IV and Ares V are fictional moons of Mars and are not observed in reality. They were created for fictional stories or games.
What?!? First, this response contradicts the previous response because Mars was not mentioned in the previous response as a planet that has a moon. Then it provides 2 fictional moons “not observed in reality.” GPT-3 is making these kinds of errors all the time, though they are not all as obvious as these examples. Without actively monitoring what it produces, we run the risk of embedding these errors into our communication and potentially into our understanding of the world.
We just need to be thoughtful.
ChatGPT/GPT-3’s strength is its mimicry of human language in style and structure, but it is too limited in the ways it uses information to reliably perform the basics of college level writing. For that reason, this current version of generative AI is not something that I see as a major threat to our classrooms. In its current state it lacks the ability to:
- Cite and analyze evidence effectively.
- Generate new knowledge through nuanced argumentation.
- Recognize and explain the significance of new knowledge.
- Tell the truth.
However, as the technology inevitably evolves and integrates with other information sources, it could resolve some of these issues or present new issues or concerns. If we instead focus on what it can do, we see that generative AI has the potential to create a deeper, more productive and more equitable learning environment. This technology is not going away, and it will change the way we use, create and interact with language in ways none of us can predict. We all have a responsibility to be thoughtful about how, when, where and why we use (or restrict) it. This is an opportunity for us to highlight human intelligence and the college classroom can be the perfect place to explore the affordances and limitations of generative AI. This is the first generation to have access to this generative AI. Our choices now will guide the future for SMU students, higher education and the technology itself.
Librarians can help.
If you are an instructor and want some help modifying your research assignments or approach in the classroom to better accommodate AI-generated text, please reach out to your librarian. Information Literacy can be an antidote to many of the current limitations of ChatGPT.
Students, you are going to be expected now more than ever to engage with information in your writing. Creating your own mental model of the topic you are writing about will be a necessity. This starts with finding and synthesizing high quality information. For the kind of nuanced understanding necessary for college level writing, ChatGPT is not reliable enough yet. So, be careful with when and how you use it.
Luckily, there are many places you can go for assistance including the human intelligence of librarians. We would love to help. The databases and search tools on our webpage are specifically designed with academic writing in mind.
Look for forthcoming blog posts on adapting teaching techniques and assignments to AI and using AI to support your own learning and writing.
In the meantime, here are some things that I have found useful when thinking about ChatGPT:
“Generative AI jargon, explained” by Jared Newman on Fast Company
“Why I’m Not Scared of ChatGPT” by Christopher Grobe on Chronicle of Higher Education
“ChatGPT transforms a classroom and is M3GAN Real?” on the Hard Fork podcast by the New York Times
“Why Banning ChatGPT in Class Is a Mistake” by Thomas Mennella on Campus Technology
“CNET Is Reviewing the Accuracy of All Its AI-Written Articles After Multiple Major Corrections” by Lauren Leffer on Gizmodo
* Worth noting that the engineers have placed some safety features that might limit certain texts from being generated. For example, it limit responses is deems as hate speech or harmful