Through the Looking Glass: Searching for Scholarship with AI

Many of us are familiar with the problem of hallucination with generative AI tools. ChatGPT produces citations to sources that seem very real yet are utter fabrications. This is because the AI tool, in order to produce an answer, is not referencing actual sources, but rather what it has learned from its training data. In other words, it produces the most probabilistic response. In the case of citations, what a source probably looks like (likely authors, journals, etc.) is not the same as an actual source.

However, many generative AI tools are optimized for information search and can link responses to actual cited sources. This is called Retrieval-augmented Generation (RAG). If you are familiar with Perplexity, you will have noticed it provides links to real web sources. The newest version of ChatGPT includes a web search feature. Google search is RAG AI powered. You can recognize when a tool is using RAG when it offers links and citations as part of its design.

When you ask a RAG AI tool a question, it first searches for relevant information from a knowledge base. This knowledge base could be a specific set of documents (such as scholarly articles) or the internet. The tool will identify relevant sources and provide a summary answer. These results include direct links or citations to the sources, as well as suggestions for how to work with those sources.

The way RAG AI tools search is different than academic databases, or even Google pre-2023. Traditional searching is essentially the matching of keywords, with other features such as adding in synonyms and corrected spelling (Google) or advanced filtering and subject searching (academic databases). RAG AI search is semantic, meaning that it uses the AI to try to understand the intent behind the query and the relationships among the words.

For example, let’s say I wanted sources to answer the question, “How does regular physical exercise influence mental health outcomes in adults?” Searching with keywords in Library Search, there are nearly 400,000 results, some relevant and some not. An equivalent search in Elicit produces a short list of results, all of which are relevant. They are also all high-level or general research on the topic, including literature reviews and systematic analyses. My question was rather broad, and Elicit gave results accordingly. Let’s compare this to a third search with a psychology database using subject terms instead. I have over 500 relevant results (thanks to those subject terms) that represent both general and specific research on the topic. In essence, there are ways to execute successful searches with traditional and AI tools, with differences in the quantity and breadth of the results.

RAG AI’s potential for producing relevant search results are very exciting. But, do they answer all the problems we have grown accustomed to with traditional search? For novice searchers, RAG AI will most likely get them much more relevant results than they otherwise might have, and it will offer summaries of the topic and individual sources that are much easier to read and understand than abstracts. For advanced searchers, they could turn up results missed with traditional searches. Or, they may get you to some relevant results much faster.

That being said, RAG AI is always trying to give you only the most relevant results for what you asked for – but then again, you are limited by what you asked for. There is a power in exploration and “serendipitous” discovery. In finding that thing that you didn’t know to ask for. Sometimes, wading through a few hundred relevant search results is what’s needed. Sometimes, you want to be able to control the results you are seeing, rather than leave it to an AI black box (and a librarian can help with that!). Of particular note, the RAG AI tools for scholarly search on the market now rely on Semantic Scholar for their knowledge base, which is only able to search through open access articles or article abstracts and has uneven disciplinary coverage, particularly for the humanities and social sciences. The results also vary quite a lot in how good their relevance judgement is. For my own searching, I have found Elicit and Undermind to have better relevancy than Semantic Scholar, Consensus, or Perplexity. The summaries and syntheses offered by these tools may be useful for novice searchers or as a first pass, but they lack the depth and nuance gained from an abstract.

These RAG AI tools offer another way of searching that is useful, but at least at this point, they cannot be our only research tool. Skills in reading scholarship, synthesizing sources, and determining relevance to the research question will remain essential, but unfortunately, AI search tools might become so tantalizingly easy that we lose something along the way.

This post was written by Megan Heuer, director of educational initiatives at SMU Libraries. Megan leads our teaching team and is the research librarian for journalism, political science, public policy, and international studies.