Heading

1

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.

Resource

Building a Privacy-First LLM

Building a Privacy-First LLM
No items found.
In the evolving landscape of AI technology, the integration of Large-Language Models (LLMs) into organizational workflows have increased. For security-focused entities, the deployment of LLMs presents unique challenges, particularly concerning data privacy and reliability. This case study offers an in-depth exploration of the process involved in building and integrating Privacy-First LLMs. It sheds light on how organizations can effectively leverage AI while prioritizing security and compliance. By adopting privacy-focused systems, security-conscious entities can successfully navigate the complexities of AI integration while safeguarding the integrity and confidentiality of their data.
Download it now.

Download "Building a Privacy-First LLM" to Learn

  • How LLMs are revolutionizing customer interactions and support
  • Key security challenges when implementing LLMs at scale
  • The impact of privacy concerns on data sharing and training
  • How Retrieval-Augmented Generation (RAG) improves LLM performance
  • Different options for securely deploying LLMs within your organization

Building a Privacy-First LLM

New advances in AI allow us to talk to computers in a language we understand. Thanks to Large-language models (LLMs), customers can get answers to specific questions without talking to a human. Marketers can conquer writer’s block with AI-generated drafts and templates based on a single prompt. Developers can debug an issue with an AI coworker who never gets tired.

LLMs like ChatGPT and Gemini are trained on billions of pieces of information. Their design allows them to understand natural language and patterns with remarkable accuracy. They’re incredible at helping marketers brainstorm ideas faster, and engineers diagnose issues easier. They’re also great at customer interaction and support; talking to humans like humans talk. At least, mostly like humans talk.

If you’re a security-focused organization trying to run an LLM at scale, you run into a few problems:

  • You have proprietary and secure data you can’t send outside your company’s doors.
  • You have a large (non-public) knowledge base to fine-tune your LLM that it hasn't been trained on.
  • You’ve heard about LLMs telling customers the wrong thing (like giving away cars for a dollar).
  • You’re worried about data leaks.

This article explores the best ways to integrate LLMs into your organization while maintaining a high level of security. We’ll detail how privacy-focused systems that use LLMs allow security-conscious organizations the benefits of AI, without the security and compliance concerns of leaking sensitive data outside the company. Organizations that securely improve their applications with LLMs can unlock one of the greatest opportunities in AI.

Considerations for adding an LLM to your application

Blind spots and shortcomings

LLMs allow us to perform a new wave of tasks we couldn’t do before by building on top of a concept called semantic search. Traditional search struggles with surfacing information. Think of storing food data in a database. It’s easy to ask our database for all types of food with “sandwich” in the name. It gets a little harder when you ask your database for “that thing they sell at Subway”. The database isn’t set up in a way to understand what you’re asking for.

Semantic Search is excellent at understanding intent behind a query and communicating the retrieved information. We call this intent-based querying. If we can use semantic search to find the right information, we can use an LLM to take that information and give us an intelligent response. Combining the two is powerful. Semantic Search helps us find the places to look for "What's the best way to overcome writer's block?", LLMs can take the five or six articles we get back, summarize them, and explain them.

Companies like Grammarly and Google improved their products dramatically in the past year by adding LLMs to their core products. You can ask for a draft outline for a presentation, and ChatGPT will instantly create one. 

Given the clear advantages of an LLM, it can be tempting to start integrating and replacing. Why do I need customer service when I could have ChatGPT or Google’s Gemini start fielding my customer’s questions?

There are a few key reasons.

  1. LLMs have a knowledge cutoff. They’re limited to the data they’re trained on. For example, ChatGPT’s knowledge cutoff is April 2023. It simply doesn’t know any information past that date. This doesn’t mean an LLM isn’t reliable within their knowledge cutoff, but it still can be. And while knowledge cutoff updates will happen, they’re not on a reliable or predictable schedule. 
  1. LLMs can be finicky. At NineTwoThree, We’ve seen cases where an LLM wouldn’t complete the task we gave it unless we “offered” it $10,000. LLMs were trained by humans, and humans are motivated by money, so this actually worked. But these edge cases shouldn’t impact your customers.
  1. LLMs hallucinate. When an LLM doesn’t know something, it guesses. This is a big deal, and a huge liability for your brand. Ask Air Canada or this car dealership. It’s one thing to say Pluto’s a planet, and quite another to describe a refund policy that doesn’t exist.
  1. Some LLMs don’t link a source. You just have to trust them. When they do give a source, they sometimes make it up.
  1. LLMs are trained on large, generic data sets. This is by design, since most models are general. Your customers have specific questions that an untrained model can’t answer. An LLM has no idea that your acronym for JIT is different from its training data, or that your customers prefer a casual tone when speaking with customer support.
  1. Customers trust your application, but external LLMs aren’t secure. Sensitive information like health history, financial information, and anything identifiable shouldn’t be thrown into the public. If your customers (or employees) work with sensitive data, they’ll probably enter it. Your customers will trust your LLM like it’s telling the truth, too, just because it “sounds” correct.
  1. By default, LLMs use your data for training. When you use a private company’s LLM, unless you opt out, you’re consenting to your data being used in their training. These companies use a process called fine-tuning to take questions and messages from users and improve the model. That’s great for them, and bad for you if you’ve accidentally entered a customer’s social security number. This is the reason companies like Apple have banned employees from using public LLMs.

While LLMs are powerful, they’re not secure on their own if you want to use customer data. They’re also not reliable. Without access to information, they’ll make things up.

How to address the shortcomings of a base model LLM

Building Reliable Systems with LLMs

If we could combine the advantages of an LLM (understanding intent, communicating like a human) with a traditional database (allowing us to store and access specific, new information) we’d be able to improve LLM reliability.

In addition to storing information in a structured way, we’ll store it as a vector. A vector is just a large array of numbers. We take a document, like an onboarding guide, and compute the vector representation of the document. This generates an embedding, and we store it in a database, along with a reference to the original document. 

The key idea is that documents with similar content will map to similar embeddings. If we generated an embedding on two news articles about the same event, the two embeddings would be closely aligned. We also don’t have to worry about new data being a different structure. As long as the embedding model can generate an embedding, we can compare similar content pieces to each other.

When we search, we can ask in plain English “What’s a great itinerary for a vacation to Alaska?” instead of some SQL query. Then, we can convert that question into an embedding, and find the closest match in our database. We’ll perform vector search to find these similar embeddings, and return their underlying documents. We’ll find documents with great information, like reddit posts on traveling to Alaska, or a blog post from a solo Alaska traveler. 

But vector databases are just a store for this data. 

How do we give an LLM access to this information?

Enter Retrieval-augmented Generation, or RAG. 

This technique leverages both LLMs and vector databases. Instead of an LLM being limited to its training data, it can access information in real-time.

A typical application architecture without RAG might look like this. 

A typical application architecture without RAG might look like this. 

RAG adds another step before we send results to the application – first sending the top-K vectors to an LLM. The LLM can draw upon its training data and the presented data, parse through and understand the top results, and craft a better answer than we could. 

This architecture allows the LLM to access real-time data from the vector database, even if that data wasn’t in its training set.

Now, you have a smarter layer to locate the right data, which passes through an LLM filter. Instead of parsing the responses yourself, you pass them to an LLM, which is much better at understanding them. 

We can do a few new things now. We can ask the LLM to weigh information from the vector database differently than its training data. If a customer asks “What does JIT mean?”, we don’t have to worry about the LLM making a guess based on training data; it will pull from our knowledge base, and weigh that answer higher. 

We can also let the LLM continuously analyze our company’s data and learn from it, to understand how our company talks and operates. LLMs excel at this; the more direction we give them, the better they perform. As a company’s knowledge base evolves and changes, the LLM can learn with it.

So, we’ve solved the problem of reliability. Instead of forcing the LLM to rely on the data it was trained on, we can give it access to an external data store.

You’re probably thinking, “How would I guarantee my company’s data is secure?”

Private LLMs: solving for the problem of security in LLMs

There are a few ways to do this.

Option 1: Use existing solutions from large AI solution providers: Companies like OpenAI offer enterprise solutions. You host your application and vector database within your company’s network. You then make encrypted requests to OpenAI’s services for LLMs. This way is far easier, but has a few key pitfalls.

Pros

  1. Smaller upfront cost. Instead of investing millions of dollars into GPUs, you can use OpenAI’s servers. Their API isn’t cheap, but it’s much cheaper than owning the hardware, at first.
  2. Easier integration. We can make API calls instead of building a datacenter. Your problems involve data compliance and setting up connections, rather than hosting an entire machine learning setup behind your company’s network.
  3. Less recurring maintenance. You don’t need to pay anyone else to manage your LLM’s infrastructure - just the single entity. Outages are their problem to solve, not yours.
  4. Solve tough problems. As of this article, GPT-4 Turbo is closed-source and the most capable LLM for high cognitive-load problem-solving. Open-source models have not (yet) caught up.

Cons

  1. Data privacy. Customers trust your application to keep their data secure. When you’re communicating with ChatGPT, they’re storing your messages in some form. Enterprise solutions like OpenAI take substantial, exhaustive measures to ensure data is secure and their process is SOC 2 compliant. Still, even the largest companies (like Marriot, Uber, Meta, Google, and Experian) fall victim to data leaks. 
  2. Censorship. Access to your app can be shut down at a moment’s notice if your application says the wrong words. What’s more, it’s unpredictable; what’s safe today might be banned tomorrow.
  3. Application latency. Aside from the IT policies you’d violate by working outside the company’s firewall, you might see degraded performance. Making a call outside your company’s network introduces points of latency in your application. 
  4. High recurring costs. Paying per request adds up, and these providers charge a premium.
  5. System reliability. Any reliance on an external system is another point of failure. Public cloud outages are commonplace. The same model from two different providers can completely change your application’s performance. What’s more, public LLMs can act up. That’s not a great experience for your users.

Option 2: Build your own privately-hosted LLM solution: If you bring a high-quality, open-source LLM within your firewall, there are advantages to building the entire solution in-house. Meta and Google have robust, powerful open-source models you can use. Instead of an external call to the LLM, you integrate it to your application’s architecture locally. Make no mistake, this is significantly more difficult, and involves significant, nontrivial talent and infrastructure costs. But it does come with some advantages.

Pros

  1. Data privacy. No worrying about sending your customer’s data, in any form, outside of your organization sinceveerything sits within a closed loop. You’ll still need to remain compliant with data encryption, access control, and continuous monitoring. But odds are, you’ve already figured that part out.
  2. Data security. You’re less prone to prompt injection attacks, where malicious actors attempt to access restricted parts of your application by manipulating the messages it receives. If someone outside the organization can’t even reach your application and the LLM it uses, you’re removing an avenue of attack.
  3. Reduced reliance on external services. Outages from an external service, like OpenAI (who had one recently) don’t affect this type of service. You’ll also avoid vendor lock-in, making it easier to course-correct if your technical needs change.
  4. Control. Set up your LLM and let it work. No need to worry about future censorship or external model behavior changes. 
  5. Customization. Every aspect of the system is under your control. Control and monitor everyone who accesses the model and the data. Tailor the model to your specific use case, starting with the training data. Use whatever data stack fits your organization’s needs. 

Cons

  1. Higher upfront cost. The first step is owning and managing the hardware, which means buying data centers and GPUs. You’ll need this entire system to sit within your company’s network. This method can be expensive, but it’s the only way to keep your LLM completely enclosed. If you’re able to afford considering this option, you probably already have some of it figured out.
  2. More recurring maintenance and CapEx. Recurring costs are inevitable. Talent to manage the data center, and separate talent to fine-tuning your models. 
  3. Longer timeline to production-ready. You’ll need time – time to acquire this setup, and time to train the LLM. As we previously highlighted, this is not an easy integration process, and will require multiple teams to make it happen. This is all before you’re in production and seeing the results.

There are pros and cons to each system. Mature companies with CapEx and expertise can benefit from building their own privately-hosted LLM. Early adopters might appreciate the flexibility of existing AI solutions providers like OpenAI.

Picking the setup isn’t all there is to it, though. Setting the system up takes time and a careful approach.

Creating a production system

Here’s what it might look like when creating a production system step-by-step.

Pick your base model

GPT, Llama, Claude - all base models trained on large datasets. If you’re picking an enterprise private LLM provider, you’ll connect to that model’s API via your cloud infrastructure. If you’re creating a privately-hosted solution, you’ll download the LLM locally within your network.

Fine-tune model

Pick a subdomain or subtask (HR case handling, customer support) and gather all your documents. We’ll fine-tune the base model by giving it example documents and questions from your data. This allows the model to key in on the specific customer use case it’s trying to solve.

During this stage, it’s important to keep any sensitive user data secure. Access control and data encryption are critical steps. There are a few techniques for securing user data within the model.

Federated learning: Separate model training between multiple local devices instead of one centralized device. Each device downloads the model, and trains it on their private data, which they don’t share elsewhere.

Differential privacy: Add noise during this training phase, obscuring any specific data from a user.

Test, evaluate, continue to fine-tune

Continue to improve the performance of your LLM. Use a technique like Reinforcement Learning from Human Feedback (RLHF), where real humans rate and rank the output of the LLM to improve its performance. Establish clear benchmarks for performance and measure it against them. 

Generate vector embeddings of your company’s data

Take the same documents from your subdomain or subtask, and gather them to be embedded. Pick an embedding model, generate and store the resulting vector embeddings in a vector database. Ensure these embeddings are generated and stored in a secure, compliant manner.

Expose LLM to production systems (via API, custom GUI, etc) and integrate with existing architecture

Once we’re production ready, integrate the LLM into your application architecture. When that is functioning, expose the LLM to your vector database. Once the two are connected, the LLM can improve its performance with your data. 

Continue generating more embeddings as your system grows

As the retrieval-augmented LLM learns more about your organization, you can give it new tasks to handle.

If any of this sounds confusing, fear not: we do this for clients every day. Contact us and we’re happy to walk you through every step of the process.

We’re skipping over a lot of steps, but that’s the basic architecture.

The critical last component: agents

A full production system needs one last orchestrator to ensure the application functions correctly. 

It’s one thing to give one LLM access to one vector database for one use case. Often, there’s more nuance. An LLM might have access to all types of company data - spreadsheets in one place, knowledge-base articles in another. 

It’s critical to set up an “agent” to manage these decisions. If a customer asks a question, which knowledge base should we pull answers from? Is this a general question we can pull from one data store, or a specific question about a product in a separate database? 

We’ll set up a decision-making layer in-between the frontend and the RAG system called an “agent” to make these decisions.

Agents can also escalate when we don’t have the information we need. They can check multiple databases for the answer to the customer’s intent, even if the first (or second, or third) don’t have what we’re looking for. If the answer isn’t there, they can relay that to the customer. 

Why build with LLMs and RAG?

RAG is a great way to retrieve information from a vector database; knowing which database to look through is where an agent excels. The agent knows the internal workings of the application, the context behind the customer, and has specific instructions for what to do.

Your data helps build your defensive moat

Ten years ago, every enterprise company hoarded as much data as they could. They didn’t know what to do with it, or how to structure it.

Now, thanks to powerful new large-language models, they do. It’s true that not every industry needs enhanced artificial intelligence like ChatGPT. But so far, it’s tough to find an industry or organization that can’t benefit from LLMs in some way.

Data is the new oil for the AI world. Once you own it and vectorize it, you should be able to monetize it.

Don’t overlook this fact. Your data, industry knowledge, and processes are your competitive edge. LLMs understand all three, and can use them. 

Secure, private, compliant access to data

Although anyone can access ChatGPT, only authorized users can work with a private LLM. With a private LLM there are no data leaks:everything is hosted within the firewall. There’s also no third-party involvement, period. This makes it much easier to ensure compliance in a regulated industry. An internal LLM is all-knowing about the company, not the internet.

You’re making your data more valuable

You don’t have to worry about OpenAI killing your feature or your entire product because they know your secrets. You and only you have access to your knowledge base, your data, and your processes. 

It’s more predictable

No clamoring about deprecation windows for public cloud services. Us-east-1 outages no longer affect your AI. Cost is dictated by how you manage your stack, not by how much the private LLM host increases their prices. What’s more, you can access it without an external internet connection, as long as you’re on the corporate network.

Better performance

Fine-tuning an LLM on your organization’s data will improve its performance. Instead of giving generic answers, your LLM tailors its performance to exactly what your customer is asking for. You can prompt engineer as much as you’d like and provide any context you need. And you’re not making any external API calls, so there’s as close to zero latency as you can ask for.

Increasing your organization’s velocity with generative AI doesn’t have to be a tradeoff of performance and security. There are safe, scalable ways to integrate LLMs into your company’s knowledge base. We’ll help you build them.

You’ll need an expert (like NineTwoThree)

Setting up a demo LLM might take you a day. 

Creating a production-ready system can take years. And you’ll need experts guiding you who know your blind spots.

You’ll need experts who can improve the current systems with a proven methodology. They’ll also be able to help you set up for observability and regression detection.You’ll need to know how to detect malicious actors and security leaks, and when to add important guardrails. 

NineTwoThree helps customers answer all of these questions and more.

If you’re concerned with building the system the right way, we can help.

NineTwoThree is a leading provider of AI application development services, and has been building AI applications since 2016. We have a deep understanding of RAG and generative AI, and we have a proven track record of success in building AI applications.

We have already built 7 applications using retrieval-augmented generation and generative AI, and truly understand the technology and how to use it to solve real-world problems.

Contact us to learn more about our generative AI services today.

If you like this, download the full resource here.
PDF This Page
Building a Privacy-First LLM
View this Resource as a FlipBook For Free
Building a Privacy-First LLM
Download Now For Free

contact us

Have a Project?
Talk to the
Founders Directly

It's free, what do you have to lose?