case study

How RAG Powers Enterprise Generative AI Projects

Limitations with Large Language Models

Businesses are hesitant to embrace Large Language Models (LLMs) like ChatGPT and Bard due to critical limitations. Studies show LLMs can miss up to 30% of relevant information. They also fabricate content ("hallucinate") in 20% of cases, which raise reliability concerns. On top of that, the lack of source links makes verifying their accuracy impossible. This case study explores Retrieval Augmented Generation (RAG), a technology that addresses these shortcomings and empowers businesses to use generative AI in their businesses. We'll demonstrate how RAG leverages existing information, minimizes fabrication, and provides clear sources for enhanced reliability and trustworthiness.

An Example of Model Bias with Pluto

Foundational models, AI systems that generate output from prompts, mimic human language and draw from vast datasets to learn patterns and structures. They way they work is by generating new data that's similar yet not identical to the source material. This can showcase inherent biases based on the original content used to train the model. A good example of bias is asking ChatGPT about Pluto.

Asking ChatGPT this question give us the following answer:

This response is factual. Now let’s ask more questions about Pluto itself:

What’s Happening Here?

All foundational models have human bias. That is, the employees at OpenAI have trained the results to be presented in a certain way. In this example, humans trained the model during its training stage to state that Pluto is not a planet.

But there was no supervision to explain if Pluto is gaseous or rock. Over the years, astronomers have determined that Pluto is more gaseous than originally expected and therefore, much lighter. The LLM is left to rationalize from the information it has both before and after 2005 and proudly stated false information. Until enough people ask the question and thumb down the response - or OpenAI employees adjust the tuning - the model will always confuse this “fact.

How Do Generative AI Chatbots Work?

To better understand how to get the right results from LLMs, let’s look at how these chatbots work. If you’re using an LLM like ChatGPT or Google’s Gemini out of the box, a user will pose a question, that’s then encoded from human to machine language, which then prompts the AI to generate a response. These tools unfortunately have the same drawbacks:

Hallucinated Data: Both models may produce plausible yet inaccurate information.
Amplification of Misinformation: They can unintentionally amplify biases or inaccuracies in their training data.
Over-Reliance: Users may mistakenly trust generated information, potentially leading to misguided decisions.
Lack of Contextual Understanding: They lack knowledge about user or company context, limiting personalization.
Inability to Fact-check: The source of generated information cannot be verified, as it's contextually generated

Prompt Engineering & Fine-Tuning

Prompt engineering and fine-tuning refine language models like Jasper and Copy.AI, tailoring responses to align with brand identity and factual accuracy.

Prompting in Natural Language Processing (NLP) directs model behavior by providing specific cues, ensuring coherent and purposeful output aligned with user objectives.

Fine-tuning adapts pre-trained language models to specific tasks or domains, improving performance through targeted training on task-specific datasets.

The solution aims to provide relevant information from a company's knowledge base, yet it fails to address the persistent issue of hallucinations inherent in large language models (LLMs). Despite generating material, the challenge lies in determining the source of the generated content, rendering it impossible to tell what’s factual information and what’s a hallucination. This lack of transparency impedes efforts to assess the relevancy of responses, making it challenging to deploy technology reliably. What it leads to is users created very detailed prompts to improve the LLM’s output.

The only viable approach to counteract the limitations of this process involves exerting more control over LLMs, leading to the innovation of retrieval-augmented generation.

Understanding Retrieval-Augmented Generation

Retrieval-augmented generation merges retrieval and generation in NLP, enhancing language models' proficiency in tasks like question answering and text summarization. Queries are encoded and matched against a knowledge base, facilitating contextually relevant responses.

Understanding Retrieval-Augmented Generation

Understanding Retrieval-Augmented Generation Mob

What’s a Knowledge Base?

A knowledge base encompasses factual and relevant company information, including FAQs, brand guidelines, and internal procedures. A knowledge base is securely stored behind firewalls and updatable at any time. It associates relevant data with meta descriptions, offering instant answers while preserving privacy and security through user permissions. The retrieval process involves converting uploaded text into machine language through embeddings, facilitating comprehension and relationships between concepts using models like "text-embedding-ada-002" from OpenAI. Once you’ve created your knowledge base, how do you retrieve its information?

Knowledge Base Retrieval Process

Creating a knowledge base involves uploading documents, similar to using Google Drive. Information is then converted from text to machine language through embeddings, such as the Langchain's Document Loader service helps contextualize diverse document types, including videos and PDFs, by splitting them into LLM-compatible chunks while preserving contextual meaning. Summaries are embedded into vectors using models like ChatGPT's, ensuring privacy with user permissions for each document.

Storing Knowledge Base Information with Vector Databases

After Understanding how knowledge bases function and are accessed, it's crucial to grasp how information is stored for later indexing. Encoding documents into embeddings provides a mathematical context representation, including meta descriptions and vector locations. Vector embeddings, essential for semantic search, capture nuanced word context and associations, facilitating more natural interactions between humans and machines. By leveraging vector embeddings, semantic relationships are depicted geometrically, enabling nuanced understanding of concepts and their associations.

Storing Knowledge Base Information with Vector Databases Mob

AI Transforms Semantic Relationships Into Geometric Relationships

In a famous example of word algebra, "man + queen - king = woman," mathematical manipulations reveal new word relationships within the semantic landscape. Observing the distances between words like man, king, queen, and woman elucidates the power of semantic understanding in AI models. This capability extends to various topics within ChatGPT, allowing users to explore associations and concepts. However, biases inherent in foundational models underscore the importance of transparency and understanding, particularly in enterprise settings where brand voices matter.

Layering Knowledge Base and Vector Databases

After understanding knowledge bases and encoding contextual items into machine language stored within vector databases, the next step is integrating this system with a Large Language Model (LLM) for natural language generation. This integration facilitates AI-driven communication, enabling cross-referencing and data accessibility through complex geometric relationships.

The ability to interact with the knowledge base as a conversational partner, leveraging approximately 40,000 English words, marks a significant advancement in natural language interactions. With private document encoding and secure communication protocols, companies can interact with LLMs while ensuring data privacy. Integration of LLMs into cloud ecosystems by major providers like Google, Amazon, and Microsoft makes this technology more accessible for enterprises, enabling them to harness its potential through APIs for secure knowledge base access and retrieval-augmented generation, promising factual, source-linked responses.

Prisonology Decreased Legal Consultation Time by 90% with AI

Prisonolgy needed a venture partner to grow operations. NineTwoThree created an AI model that reduced consultation time by 90% using reasoning tactics in OpenAI and increased sales 2x in 4 months.

Learn more about Prisonology

Prisonology Decreased Legal Consultation Time by 90% with AI

Build With

NineTwoThree is a leading provider of AI application development services, having pioneered innovative solutions since 2016. With expertise in retrieval-augmented generation (RAG) and generative AI, we have successfully developed seven groundbreaking applications. Our deep understanding of these technologies enables us to deliver solutions that tackle real-world problems effectively.