A worldwide chatter started around ChatGPT when it launched a year back. The technology resonated with millions, for the first time, engaging ordinary users first-hand with AI. Even the not-so-tech-savvy community couldn’t stop gushing over its smarts, and why not? ChatGPT made AI accessible to the public, but more importantly, it showed the innumerable ways in which AI can integrate with their daily lives, and enrich trade and commerce.
A New AI Framework to Fact-Check LLMs’ Responses
But large language models (LLMs) are not perfect. Researchers are exploring alternative ways to steer these models away from the potential challenges that lie in the path. At times, these models are unbelievably meticulous and spot on in nailing responses. But other times, they spit out answers that are fabricated and inaccurate.
Managing this volatility has been one of engineers’ biggest challenges so far. How can you build a model that is always right? More importantly, how can you get the model to say “I don’t’ know” when it doesn’t know an answer.
Data engineers have come up with a framework that takes a shot at solving this problem. Retrieval-augmented generation, also called RAG, is an AI framework that enables models to fact-check a response before giving it back.
At the recent Edge Field Day event where the attending delegates presented their views and experiences with AI in short, riveting talks, Ben Young, Head of Cloud Products and Field Day delegate, gave an Ignite Talk on LLMs. Young shared his own journey of building a chatbot web interface using RAG, and what he learned in the process.
Retrieving Fact Chunks from Large Documents
Young’s project is based on the Veeam knowledge base. The goal is to make the knowledge base articles a little more mineable so that answers to questions can be found without reading the entirety of the articles.
“We’re used to consuming knowledge base articles. They’re long, and so I thought to myself, wouldn’t it be nice if we could chat to these things?” he said.
VannyGPT uses OpenAI LLMs with RAG. OpenAI models are trained on data from the Internet. This means the models’ primary source of information is public data. There are several implications of that.
“They scraped the Internet and consumed all these things. The models were trained at a great cost to the environment by the looks of it, but they’re stuck in time,” he noted.
Without retraining, the models can’t access the new and updated information that is coming out every day. As a result, the information they give back are often not the most accurate or up-to-date. The noise on the internet makes the answers fuzzy.
Getting around that would take solving two key problems –the models should provide the most recent data, and cite the source of information for readers’ assurance.
RAG allows engineers to pass additional context to the LLMs making it possible to verify information from private datasets. Here’s how it works. When a user types in a prompt, a piece of code retrieves information from an external source of data – this could be a document, or a database – and send it out to the model. The model interprets the data and give back the summary with the source tagged in.
But there is a challenge. “The knowledge base articles look very consumable on the Internet, but there are a lot of embedded forms and messy data that we don’t necessarily want to put into our prompting because that’s an additional cost. The more tokens we are passing into OpenAI, the higher the cost is going to be. We don’t want to get distracted with info that it doesn’t need to know about,” pointed out Young.
Young has developed a piece of code that can scrape articles off of the knowledge base website and translate the text into Markdown format, getting rid of all the excess HTML, tags and embedded styles that can act as distractions. The clean data is stored in a content store. An API is put on top of it to make the information publicly accessible.
In this way, using RAG, Young was able to sidestep the usual confusion and hallucination typical of LLMs, and finetune the results with more private and authentic data. The information VannyGPT serves up is accurate and on-point, without the rigmarole of retraining at great costs.
Watch the demo of VannyGPT v2 in the Ignite Talk – Falling Down a LLM Rabbit Hole with Ben Young– from the Edge Field Day event to learn more.