Heading

1

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.

How Retrieval Augmented Generation Powers Enterprise Generative AI Projects

How Retrieval Augmented Generation Powers Enterprise Generative AI Projects

As with any groundbreaking technology, Generative AI has imperfections. One of the most prominent of these limitations is the challenge faced by LLMs in delivering accurate, sourced, and contextually appropriate content. This main limitation makes it almost impossible for a company to stake its reputation based on an LLM’s unknown output. How would you feel if a brand you trusted lied to you while you asked them questions about your account? This fear has led to the inception and growing relevance of retrieval-augmented generation (RAG). With RAG companies have more control over the materials an LLM uses for its information and the answers it gives users.

Large Language Models (LLMs) have taken the spotlight since the release of ChatGPT. LLM’s can perform a wide range of contextual tasks at almost the same level of competence as a human. From Google Translate using LLMs to improve the fluency of its translations to Grammarly providing more comprehensive and informative feedback on users’ writings, we have seen almost every tool we use daily come out with its own LLM implementation. 

To the untrained AI practitioner, it could seem that launching an LLM is an easy task.

As with any groundbreaking technology, Generative AI has imperfections. One of the most prominent of these limitations is the challenge faced by LLMs in delivering accurate, sourced, and contextually appropriate content. This main limitation makes it almost impossible for a company to stake its reputation based on an LLM’s unknown output. How would you feel if a brand you trusted lied to you while you asked them questions about your account? This fear has led to the inception and growing relevance of retrieval-augmented generation (RAG). With RAG companies have more control over the materials an LLM uses for its information and the answers it gives users.

At NineTwoThree we have been building Machine Learning and AI products since 2016, and we have been focusing on this new framework that is beginning to gain traction across the internet. Since the AI boom, we have been inundated with requests from both Fortune 500 companies and funded startups to build out this framework – without knowing it had a name. What’s even more exciting is we finally have a framework to respond to the demand thanks to some incredible cloud product releases in May of 2023.

This case study will highlight our learnings from building $ 1 million of products featuring an LLM for Fortune 500 and startup companies. First, we will dive into the challenges with LLMs.

Challenges with Large Language Models (LLMs):

Why companies cannot trust using LLMs:

  • LLMs do not know the relevant information
  • LLMs do not state facts & often hallucinate.
  • LLMs do not provide the source link.

One of the most significant associated with LLMs is the inability to control what information they use to generate responses. Oftentimes the LLM will even hallucinate by generating factually incorrect, nonsensical text. Hallucination generally happens for the following reasons:

  • Incomplete or contradictory training data: LLMs are trained on massive datasets of text and code, but this data is not always complete or accurate. As a result, LLMs may learn to associate certain words or phrases with certain concepts, even if those associations are not accurate.

  • Lack of common sense: LLMs do not have common sense in the same way that humans do. This means that they may not be able to identify and reject text that is false or unrealistic.
  • Lack of context: LLMs often generate text based on the context of the prompt and the text that has been generated up to that point. However, they may not have access to all of the relevant context, which can lead them to generate text that is inaccurate or nonsensical.

But to understand why hallucinations exist, we need to take a step further into the foundational model by first clarifying Generative AI - and then using these building blocks to introduce RAG as the framework for enterprise-grade applications.

What is a Foundational Model?

Foundational models are a form of generative artificial intelligence that generates output from one or more inputs that we call prompts, in the form of human language instructions. These AI models are trained on large datasets of existing data, and they learn to identify the patterns and structures within that data. 

Once trained, the model can generate new data that is similar to the data it was trained on, but not identical. We can see how bias inserts itself into a foundational model with the example of Pluto. 

Explaining Bias in the Foundational Model

The solar system

A simple explanation of why models hallucinate is to understand how they are trained. Let’s start with a piece of knowledge that was accepted by all humans before but was later disproved. 

How many planets are in our solar system?

The foundational model was trained on thousands of textbooks, of all reputable sources that explained that Pluto is a planet. Since 2006, however, there have been a plethora of blogs, articles, and books that explain Pluto is no longer a planet. 

Asking ChatGPT this question yields the following answer:

Is Pluto a planet?

The response is accurate and factual. However, let’s ask a second question:

What kind of planet is Pluto?

What is happening here? How can the foundational model understand Pluto is not a planet, but still mixing up facts from when we believed it was a planet?

All foundational models have human bias. That is, the employees at OpenAI have trained the results to be presented in a certain way. In this example, humans trained the model during its training stage to state that Pluto is not a planet. 

But there was no supervision to explain if Pluto is gaseous or rock. Over the years, astronomers have determined that Pluto is more gaseous than originally expected and therefore, much lighter. The LLM is left to rationalize from the information it has both before and after 2005 and proudly stated false information. Until enough people ask the question and thumb down the response - or OpenAI employees adjust the tuning - the model will always confuse this “fact.”

“It is important to provide the facts from a trusted source. But each company's trusted source is different and we cannot rely on the “facts” from foundational models because foundational models are inherently biased.”

The Generative AI Chatbot User Flow

All GPT Chats are built on foundational models. The pretraining is required to get the model to perform within the guardrails and remain safe for users. 

Here’s how users interact with an LLM to produce generative text::

  1. A user asks a question
  2. The question is encoded from human language to machine language
  3. A prompt is produced
  4. The LLM receives the prompt and generates an answer
  5. The user receives the result
Chatbot User Flow

Another way to think about this interaction is the Venn Diagram below. With this model, the generative AI solution can only use data that is either part of the LLM’s foundation or has been given to the LLM through the user inserting information through a prompt.

Today, most users are interacting with an LLM through ChatGPT or Bard. Although users today can realize a lot of efficiencies through using these tools, there are a lot of limitations that cannot be solved by simply giving more information through a prompt.

The foundation of all LLMs

Limitations of ChatGPT and Bard aka Gen AI Chatbots.

ChatGPT and Bard are both LLMs that can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

ChatGPT was developed by OpenAI and was released in November 2022. It is a 1.3 billion parameter model that is trained on a massive dataset of text and code. Bard was developed by Google AI and was released in October 2023. It is a 137-billion parameter model that is trained on a dataset of text and code that is specifically designed for conversations. 

Although both tools have differing strengths, they are prone to the same limitations:

Hallucinated Data: LLMs might generate information that sounds plausible but is baseless or incorrect, known as "hallucinating" facts.

Amplification of Misinformation: LLMs can magnify inaccuracies or biases present in their training data, inadvertently spreading misinformation.

Over-Reliance: Users might mistakenly trust LLM-generated information, leading to misguided decisions based on incorrect data.

Lack of User Data or Specific Knowledge: ChatGPT and Bard do not know the context around the user or company's knowledge base. They cannot act uniquely based on beliefs or personalities. 

Cannot be Fact-checked - You simply cannot ask for the source of where the LLM pulled the information from because it does not exist. It was generated in the context without understanding the source information.

In consumer-facing LLMs like ChatGPT and Bard, combating these reliability issues lies in prompt engineering and fine-tuning.

Understanding Prompt Engineering and Fine-Tuning

To introduce knowledge into the process, products like Jasper and Copy.AI became experts at fine-tuning the LLM to create on-brand and factual responses to users' queries. Builders of these consumer AI tools add knowledge to the prompt and then segment that knowledge per user to ensure each company receives its own fine-tuned GPT.

Prompting is a method used in Natural Language Processing (NLP) to steer a language model's behavior by supplying it with specific cues or instructions. This technique allows a user to manipulate the model's output to align with their objectives.

Prompting is achieved by altering where the model looks for the specific answer and guides it to produce text that is coherent, relevant, and purposeful. The complexity of the prompts can range from a simple phrase to detailed instructions based on the task's needs. The art of creating an effective prompt for a specific use case is known as prompt engineering.

Fine-tuning is a popular technique in NLP that adapts a pre-trained language model to a specific task or domain. It leverages the knowledge from a larger pre-trained model and trains it further on a smaller, task-specific dataset, enabling the model to perform better on the targeted task.

The process involves two steps: selecting a pre-trained language model that has been trained on diverse text sources, and then further training this model on a domain-specific dataset. This additional training allows the model to adapt to the specific patterns and nuances of the target task, enhancing its performance on task-specific applications in terms of quality, cost, and latency.

The basis of prompt engineering

While this solution does provide relevant information about a company's knowledge base, it still does not solve the hallucination aspect. 

On top of that, since the material is being generated, it’s impossible to determine the source of the generated hallucination or fact. Not understanding where the knowledge came from makes testing the relevancy of the answers nearly impossible. Again, enterprise and mid-market startups cannot deploy technology that cannot consistently behave.

How to come up with a good prompt

In the end, all you are doing is making a really, really good prompt and generating a really, really good answer. The only way to combat this process is to take more control of the LLM, which brings us the innovation of retrieval-augmented generation.

What is Retrieval-Augmented Generation?

What is Retrieval Augmented Generation?

Retrieval-augmented generation is an NLP technique that combines two key components: retrieval and generation. This approach is often used to improve the performance of language models in tasks that require generating coherent and contextually relevant text, such as question answering, text summarization, and text completion.

Retrieval-augmented generation essentially augments the prompt to pull information from a trusted (and private) knowledge base and then appends the original prompt to be sent to the LLM.

Working with a private knowledge base
Question context into machine language

When a user asks a question, the text of the question is encoded and converted to machine language. Then the context of the question is queried against the knowledge base. 

Building a Knowledgebase for Retrieval-Augmented Generation

The knowledge base can be anything the company decides is factual and relevant information such as FAQs, brand guidelines, internal procedures, company videos, blogs, or emails Anything and everything can be included in the knowledge base that is considered intellectual property for the company. 

Most importantly, it can be stored securely and behind a firewall so the information is not exposed to the internet and can be updated at any time. 

A Knowledge base is:

  • Secure and private. It’s hosted inside a company’s cloud infrastructure. 
  • Updated at any time by the owner. 
  • Associates any relevant information and meta descriptions to its content.

A knowledge base gives employees instant answers.  It uses natural language that understands the company's intellectual property and context without compromising on privacy or security. We can even bring privacy and security to a new level by applying user permissions.

Knowledge Base Retrieval Process

Creating a knowledge base is relatively straightforward. If you have ever uploaded documents to Google Drive you have essentially created a knowledge base. When information is uploaded to the knowledge base, it will have to be converted from text to machine language. To do this, we create an embedding. 

Think of embeddings as a numerical representation of the concept in a number sequence. Once all of the embeddings are stored, the machine can understand the relationships between those concepts. At NineTwoThree we use an embedding model called “text-embedding-ada-002” from OpenAI. 

How knowledge bases are encoded

Sometimes documents do not fit into a prompt because they are either too long or unstructured. A service from Langchain called the Document Loader makes it possible to contextualize the incredibly diverse set of documents and data our clients might store. There are all sorts of loaders from Langchain that convert content such as videos, git repos, PDFS, spreadsheets, FAQs, or Word documents.

Next, we use Langchain to split the content into bite-sized pieces that will fit into an LLM prompt while still maintaining the contextual meaning of the document. This step resembles an art form and takes trained engineers to operate. As a reminder, we link the chunked data to the original artifact so we can always see the underlying source.

Lastly, we take the summaries and embed the text chunks into vectors. This is simply turning human-readable language into machine language (numbers) to be placed into the vector database. We use the ChatGPT embedding model, but we can be completely agnostic with the LLM we communicate with (or what the business has installed). To keep sensitive information private, we create user permissions for each document.


Storing The Knowledge Base Information as Embeddings with Vector Databases

Now that we have explained how the knowledge base functions, and how you can access it, we need to explain how the information is stored for indexing later. Encoding the documents to an embedding provides a mathematical representation of the context of the information. It will include the meta description and vector location. But how can you index that so the LLM knows what to pull back when it’s prompted?


About Vector Embeddings

Vector embeddings, often referred to simply as vectors, have emerged as a powerful tool in enabling more effective semantic search and contextualization. In essence, a vector is a numerical representation of data in a related context. In the context of human language, these vectors capture the essence of words and the broader context in which they are used.

Consider the famous Muhammad Ali quote, "Float like a butterfly, sting like a bee." In machine language, this is not just a sentence, but a rich representation of concepts: lightness, agility, and the act of boxing. All this context is meticulously stored within a single vector, allowing for nuanced understanding and interpretation. But how do we understand associations with other concepts? That’s where semantic search comes in.

Vector embedding


About Semantic Search

Human communication is inherently nuanced and context-dependent. Take, for instance, a trip to the grocery store, where you describe a product as resembling a strawberry. It’s red, and shares some traits with a blueberry. What are we trying to describe? A raspberry. But we didn’t need keywords to get the answer. Semantic search enables machines to understand these implicit associations, making interactions more natural and human-like.

Bridging Human and Machine Language

At the heart of this transformation is the encoding model, a critical component in converting human language into machine language. When communicating with a model like ChatGPT, our text-based input is converted into numerical representations or vectors. When we receive a response, it's the decoder's job to translate these vectors back into text. This encoding model, shared by various applications, including Ada002, is key to this process.

The encoding model

Vector embeddings are not limited to textual data. They extend their reach to videos, transcripts, and other forms of information. By leveraging this technology, we can efficiently translate diverse data types into language understood by machines. Importantly, this can be achieved outside the confines of large language models.

While humans exist in a three-dimensional world, vector databases operate in high-dimensional spaces that can be challenging to conceptualize. To illustrate these principles, think of data in three dimensions. In reality, vector databases are multidimensional and encompass an unfathomable number of data points. Let’s see how semantic relationships work with the example below.

How vector bases visualize

Semantic Relationships: A Geometric Approach

The most important thing to understand about vector databases is that the vectors are stored in relation to similar concepts. The colors of the rainbow will have smaller distances between their vectors, while the list of car companies will have a greater distance from the color vectors. It becomes multi-dimensional because a blue corvette might be similar to a blue Maserati and therefore, the context of the color is important in the relationship. 

We are explaining semantics. Semantics are geometric relationships that help us understand the similarities and differences between concepts.


AI turns Semantic Relationships Into Geometric Relationships


One of the most famous examples involves word algebra: "man + queen - king = woman." This mathematical manipulation enables us to discover new words and relationships within the vast semantic landscape.

Geometric relationships

Notice that the distance between man and king is similar to the distance between queen and woman. Also, note that the distance between king and queen is similar to the distance between man and woman. If you did not know the word for one of the vectors on the parallelogram, then you could derive the word using word math. 

You can apply this model to different topics right inside ChatGPT. For instance, if you remove the idea of music from Taylor Swift and add in the concept of politics to see what type of person the model will find

Who will the model find?

Understanding how the foundational model understands concepts is important, but there lies a massive flaw: if we go back to the reference of Pluto, there surely is a reference between the concept of textbooks and the idea of nine planets. However, through careful human supervision, the response has been forced to say that Pluto is not a planet. 

So what is happening?

The foundational model has bias. And the owners of these models refuse to report publically how the model is trained. (what some are calling transparency) for this exact reason. As an enterprise company, it is vital to understand those biases because brands have voices. If the large language model disagrees with a brand's point of view, it will make it tough to use that model. 

Layering Knowledge Base and Vector Databases

We have learned about knowledge bases together and how any contextual item can be encoded to machine language and then stored in a vector database. We have learned that this vector database is private and secure in a cloud server. We now need to communicate with the Large Language Model by securely sending our prompt to the model to generate natural language. 

How do we do that?

Layering knowledge and vector databases

The integration of a large language model with the encoded knowledge base is where the magic happens. By connecting these two powerful entities, we enable AI-driven communication that transcends traditional boundaries. Complex geometric relationships are established, opening doors to cross-referencing and data accessibility.

One of the most remarkable outcomes of this integration is the ability to communicate with the knowledge base as if it were a conversational partner. With an extensive vocabulary of approximately 40,000 English words, the system becomes a versatile communication tool. This marks a watershed moment in natural language interactions. 

And it works through the geometric relationships we talked about. 

When a private document is encoded, the concept of that document is converted to a series of numbers (floats) that are stored with the vector. We can utilize this float, and call upon its location to determine the relationship to the public Large Language Model. 

The float number is securely sent alongside any other information from the company’s knowledge base to the LLM to generate an answer for the user. OpenAI promises that this information is not being used for training purposes. As long as we trust Microsoft and Google to never sell our data and hold them to their word, we will remain private as we begin to communicate with the LLM. 

This technology is even easier to implement today because cloud service providers including Google, Amazon, and Microsoft are integrating LLMs into their cloud ecosystems. This strategic alignment brings this previously hard-to-access innovation within reach for enterprises seeking to harness its potential. Using APIs, they can easily create and index their private knowledge bases, ensuring that information is accessed securely and under control.

Now we have the full promise of what we thought we could get with ChatGPT and Bard. We have:

1. An LLM with relevant information

2. An LLM that only states facts & does not hallucinate

3. An LLM that provides the source link.

What RAG can offer


With the retrieval-augmented generation complete, we can deploy it in the real world.

Applying Retrieval-Augmented Generation in the Real World

To illustrate the practical applications of RAG technology, consider a scenario involving the complexities of handling invoices. 

Traditional invoices often have no standard and are convoluted in structure, making it challenging to locate specific information. By encoding invoices and incorporating them into the knowledge base we create, we form a structured and intuitive retrieval system with user permissions as to who can access specific data.  Users can engage in dynamic conversations with their invoices and retrieve relevant information promptly.

Examples of invoices


Encoding invoices


The result



Additional information



Next, we’ll showcase three other examples of RAG technology that we have built with our clients:

Gaining Legal Insights In Record Time

Prisonology

Legal firms have embraced this technology to streamline their processes. For instance, in the legal industry, information retrieval is critical. Lawyers need to access all relevant court cases about a client quickly. 

By transforming a vast array of court cases into a comprehensive knowledge base, this technology empowers lawyers with instant access to crucial information. It not only simplifies the research process but also provides recommendations and insights, thereby enhancing their ability to serve their clients effectively.

Read more in our portfolio about Prisonology.

Navigating Complex Political Narratives

Political clients

In the realm of public discourse and politics, RAG technology helps manage the complexity of multifaceted narratives. By enabling access to an extensive knowledge base that is based on one side of the political spectrum, it ensures that communication remains rooted in factual information (and political bias of the political party). This is particularly significant when dealing with sensitive topics where biases can taint the narrative. 

With a RAG, we can make sure a chatbot that responds for political candidates frames every answer with the stance the particular candidate takes with the issue at hand, and never wavers or causes mishaps that can ruin the campaign.

Transforming the Insurance Industry

The insurance industry with AI

Insurance companies invest significant time and resources in determining the suitability of applicants. This often involves operating call centers, where a notable portion of their efforts go to waste as they frequently engage with individuals who are not the right fit for their insurance products.

What if every interaction, conversation, or dialogue with customers was meticulously cataloged and stored in a vector database? Within these dialogues lies valuable insights. For instance, the initial reason for a customer's contact may not align with their ultimate insurance needs – they might start discussing life insurance but actually require auto or health coverage. 

This level of nuanced understanding is typically within the purview of human agents who can detect these cues and redirect the conversation accordingly.

If you rely solely on keyword matching, you might miss out on these crucial insights, as it doesn't discern the underlying intent. Yet, by leveraging semantic matching and structuring data in a manner congruent with your business domain and expertise, you can effectively integrate your product offerings into these dialogues. When a new prospect initiates contact with your company, you can readily present the insurance products tailored to their needs, based on past conversations and what has historically led to successful sales.

Putting It All Together: Google's Bard Generative AI Search

Notably, on August 2nd, 2023, Google introduced its BARD generative AI update. This development is pivotal in the context of the following discussion as it brings together all the topics of this case study.

Revisiting Pluto's planetary status with this search serves as an illustrative example of where confusion still exists. To clarify, Pluto is not classified as a planet. When we ask Google whether Pluto is composed of rock or gas, the response aligns with what we have observed earlier. 

However, an intriguing addition has emerged as of August – a small arrow icon. It's noteworthy that this feature was briefly introduced and subsequently reinstated by Google within a week, and it remains available to us today.

What is Pluto really?

This arrow symbolizes a critical enhancement, providing users with direct access to pertinent information about Pluto. It not only offers relevant links for fact-checking but also allows users to dive deeper by simply clicking the arrow, thus facilitating further research. For those in the audience actively engaged in SEO, link building, and information dissemination via Google, this development carries significant implications. Herein lies the prominence of domain authority.

Given the existence of multiple answers and information sources, the determination of relevance has evolved, favoring authoritative content. This has been underscored by the incorporation of fact-checked articles. An element worth noting is the potential need for tokenization or some mechanism to credit content creators, a point deserving of future consideration.

Implementing Retrieval-Augmented Generation

In summary, we have addressed various challenges in obtaining accurate and verifiable information. We can ascertain facts, corroborate their existence, and extract information directly from source documents. 

The question remains, how can you implement this technique? In line with the notion that data is the modern-day equivalent of a valuable resource, consider your company, website, and stored information as your most precious assets in the realm of generative AI.

These advanced AI systems scour the internet for information, with the primary objective of identifying the domain authority capable of delivering the most accurate responses. The key takeaway? To successfully incorporate LLMs into organizations, each company will need to position itself as the subject matter expertise.

With the RAG model, the knowledge base assumes paramount importance. It is the aspect we can control, which enables us to determine the accuracy of the information it holds, fix inaccuracies, and so it remain up-to-date. As any company grows and learns new stories to tell, the RAG can be updated to communicate with customers about new features, news, or updates. 

What is most interesting is that when we build these models at NineTwoThree we are LLM agnostic. 

Different LLMs can serve different purposes and remove the biases needed for companies to trust the foundational model and tailor it to our specific requirements. This innovation underpins our enthusiasm for advancing this field, as it signifies a watershed moment when large language models transition from being mere novelties to highly practical tools. 

Now, for the first time, it is feasible to engage in natural language conversations with machines and conclusively assert that the data in question belongs to us.

Build With NineTwoThree

NineTwoThree is a leading provider of AI application development services, and has been building AI applications since 2016. We have a deep understanding of RAG and generative AI, and we have a proven track record of success in building AI applications.

We have already built 3 applications using retrieval-augmented generation and generative AI, and truly understand the technology and how to use it to solve real-world problems.

Contact us to learn more about our generative AI services today!

As with any groundbreaking technology, Generative AI has imperfections. One of the most prominent of these limitations is the challenge faced by LLMs in delivering accurate, sourced, and contextually appropriate content. This main limitation makes it almost impossible for a company to stake its reputation based on an LLM’s unknown output. How would you feel if a brand you trusted lied to you while you asked them questions about your account? This fear has led to the inception and growing relevance of retrieval-augmented generation (RAG). With RAG companies have more control over the materials an LLM uses for its information and the answers it gives users.

top-pink-arrow
Learn More
PDF This Page

Contact Us