Resource

Why Semantic Search Will Change the Internet

No items found.

While LLMs are trained on large amounts of data, they don’t have the ability to understand the context and intent of the words they are trained on. This is where semantic search enters the conversation. Semantic search has the ability to recognize the underlying meaning of search queries and return results based on the meaning, even if the search query doesn’t contain the exact keywords that are used in the results. Before we can dive into how semantic search makes LLMs better, we need to first look at semantic search itself.

Download it now.

Download "Why Semantic Search Will Change the Internet" to Learn

The role of large language models (LLMs) like ChatGPT in the generative AI revolution
How semantic search goes beyond keyword matching to understand search intent
The importance of word embeddings in improving search relevance
The differences between keyword search and semantic search
Practical applications of semantic search in business, from customer personalization to content delivery

The ChatGPT Craze: Understanding the Hype and Its Limitations

We’re currently living in the midst of a craze behind ChatGPT and generative AI. Since OpenAI released ChatGPT in November 2022, the world has witnessed an unprecedented surge in innovation, driven by the advancements in large language models. To someone just entering the artificial intelligence space, it seems that ChatGPT is the main invention in this era. Projects like Midjourney, DAL-E, and Jasper have heralded an era where generative AI products are flourishing. They have reshaped the way we interact with technology and create content, as they enable human-like text generation and serve as versatile tools for conducting research and content creation.

While it's crucial to recognize that these products while promising, are still in their nascent stages and face substantial limitations. They are primarily rooted in their reliance on ChatGPT-based architectures, which limits their transformative powers. In this white paper, we delve into the true innovation behind ChatGPT and shed light on why these products, despite their initial buzz, may encounter sustainability challenges in the long run.

Large Language Models – The Backbone of Generative AI

A large language model (LLM) is a type of artificial intelligence (AI) that can generate and understand human language. LLMs are trained on massive datasets of text and code, which allows them to learn the language’s patterns and rules. Once trained, LLMs can be used for a variety of tasks, including generating text, translating languages, writing different kinds of creative content, and answering your questions in an informative way.

ChatGPT and Google Bard are both LLMs. ChatGPT is developed by OpenAI, while Google Bard is developed by Google. These models both have different underlying large language models: ChatGPT relies on GPT-3.5 or 4, while Google uses PaLM 2. Both models are trained on massive datasets of text and code, but they have different strengths and weaknesses due to their underlying LLMs.

ChatGPT, a generative pre-trained transformer model, specializes in generating creative text formats, like poems, code, scripts, musical pieces, emails, and letters. On the other hand, Google Bard is a pathways learning model and excels at answering factual questions and providing comprehensive and informative responses. One way to tell the difference is to ask both tools to create a story, ChatGPT will write one, while Google Bard will tell you it can’t.

You can see more differences between the LLMs in the table below:

‍

LLMs can be used in the following ways:

Generating text: LLMs can be used to generate all sorts of text, including news articles, blog posts, poems, code, and scripts.
Translating languages: LLMs can be used to translate text from one language to another.
Writing creative content: LLMs can be used to write different kinds of creative content, like poems, code, scripts, music, email, and letters.
Answering questions in an informative way: LLMs can be used to answer your questions in a comprehensive and informative way, even if they are open-ended, challenging, or strange.
Analyzing and Summarizing Large Amounts of Unstructured Data: It’s often difficult for the average person to quickly read large amounts of data. LLMs can quickly summarize large amounts of data, allowing humans to focus more on strategy than comprehension.

What is Semantic Search

Semantic search is a type of search that goes beyond simply matching keywords to results. It attempts to understand the meaning of a search query and return results that are relevant to that meaning, even if the query does not contain the exact keywords that are used in the results.

In addition to LLMs, Semantic search is also made possible by:

Knowledge bases: Knowledge bases are databases of information about entities (such as people, places, and things) and the relationships between them. Knowledge bases can be used to understand the context of search queries and to return results that are more relevant to the user's specific needs.
Machine learning algorithms: Machine learning algorithms are used to train LLMs and knowledge bases, and to generate search results.

By understanding the meaning of search queries and the context in which they are made, semantic search can return results that are more relevant and useful to users. To see the semantic search in action, we can look at the example below.

Understanding Semantic Search

To better grasp semantic search, let’s start with this example – if I asked you what words are similar between basketball, soccer, dinosaur, and potato, which two do you think of the most similarity? Even more, If I asked you to group sports together from every word in the dictionary would you include running, calories, carbohydrates, and calisthenics?

How can you possibly explain this to a computer?

What we are attempting to solve here is called a multidimensional vector and the word “sport” can be thought of as a vector in space. “Sport” can be represented as a number as well as “potato”, “basketball”, “soccer” etc.

If you were to derive the relationship between two points in space, it's relatively easy to draw an XY coordinate system and measure the distance between those two points. We can even go further and do this in three dimensions without much mind-bending.

Multidimensional vectors have thousands of dimensions with millions of points of interconnectivity between any two points.

Semantic search attempts to mathematically represent each word in the English language to calculate similarities. But how does the computer know which words are more similar than others? Semantic Distance.

Applying Semantic Distance To Semantic Search

Now that we have vectors on a plane, we can start measuring the distance between the points. Semantic distance is the measurement of the similarity between two different words based on their meaning. It's used to help computers understand the language better by turning words into numbers we explained are vectors, and then measuring how far apart these vectors are in a big space where each dimension represents a different aspect of meaning.

Comparing words together is called measuring the semantic distance between two words like "cat," "dog," "fish," and "bird." You want to teach a computer to understand how these words are related to each other based on their meaning. So you turn each word into a number called a vector, which has a bunch of different parts that represent different aspects of the word's meaning.

For example, the vector for "cat" might have a high number for "small," "furry," and "meows," while the vector for "bird" might have a high number for "flies," "feathers," and "chirps."

Applying Semantic Distance To Semantic Search

Now, to figure out how these words are related to each other, you can use something called semantic distance. This means measuring how far apart the vectors for two words are from each other in this big space where each dimension represents a different meaning.

One of the easiest ways to think about how words are related to each other is the famous language arithmetic King – Man + Woman = Queen. You can instantly grasp the simplicity of knowing the answer is queen and this becomes essential when converting language to math.

To drive the point further this MIT paper uses this clever word math “Paris – France + Poland = Warsaw”

Once the words start obtaining mathematical meaning, they can be clustered in space based on similar words and meanings. Cars, wheels, steering wheel, driving, Chevrolet, and Ford all have close proximity while apples, bananas, strawberries, and raspberries have close proximity to each other, both forming their own cluster The “Nissan Cherry”, however, will have similarities to both the fruit cluster and the car cluster - such a cool-looking car.

Using semantic distance, we can figure out which words are most similar to each other and which are most different, and we can use this information to help the computer better understand language and perform tasks like identifying synonyms, antonyms, or related words. Using semantic search and semantic distance allows an even better way to retrieve information. Let’s find out how it will change the internet.

Why Semantic Search Will Change The Internet

The Internet is heavily based on matching keywords with users' search intent to a result in a database. Search for “glasses” and you could get a mix of drinking glasses and spectacles. We have learned to refine our searches by adding clarifiers. Words like “drink”, “wine” or “sun” will provide us with the results we are shopping for.

But keywords have their limitations.

If I were searching for “beach glasses” the results could either be full glasses for liquids to use on the beach or sunglasses depending on how the company's search algorithm behaves.

Thus, keywords are controlled by the underlying algorithm.

Semantic search is a modern approach to information retrieval that leverages word embeddings to find semantically relevant documents or information from an unstructured knowledge base or database.

Word embeddings are dense vector representations of words that capture their semantic meaning in a multi-dimensional space. These embeddings are generated using algorithms like Word2Vec, GloVe, or FastText, which learn from large text databases.

In semantic search, the main idea is to identify documents that are semantically similar to the query. This is done by converting the query and the documents into their respective vector representations and then comparing their similarity in the embedding space, usually using a distance metric like cosine similarity. We will dive into the differences between semantic search and keyword search below.

Keyword Search

Keyword search is a traditional search method that uses specific words or phrases to search for relevant documents.: It typically relies on the exact matching of terms, which means the search engine looks for documents containing the specific words or phrases entered by the user.

The keyword search can be limited by the variations in how words are used (synonyms, different word forms, etc.) and may not capture the semantic meaning of the query. Also, it might retrieve irrelevant results if the keyword has multiple meanings, or is ambiguous.

Keyword search often supports the use of Boolean operators (AND, OR, NOT) to combine or exclude keywords and help refine search results. Remember when you had to actually type AND into Google to combine keywords?

Semantic Search

Semantic search leverages word embeddings, which are dense vector representations of words that capture their semantic meaning. Word embeddings are generated using algorithms like Word2Vec, GloVe, or FastText, which learn from large text corpora.

Instead of relying on exact matching, semantic search identifies documents that are semantically similar to the query. This is done by comparing the vector representation of the query with the vector representations of the documents in the corpus.

Semantic search is more tolerant to variations in word usage, synonyms, and even spelling mistakes, as the embeddings can capture semantic similarities despite these differences.

The approach is more robust in retrieving relevant results, as it considers the context and meaning of the words rather than just their presence or absence.

So by comparison, while keyword search is a traditional method relying on the exact matching of terms, semantic search uses word embeddings to capture the semantic meaning and find relevant documents based on similarity in meaning. Semantic search is generally more robust and better at handling variations in word usage, but it may require more computational resources to generate and compare embeddings. Semantic search can go beyond just finding the best information in a search engine or chatbot – it can be applied to your business.

How Semantic Search Can Be Used In Enterprise

Imagine you have 10,000,000 customers who all purchased a product from you. Heck, imagine you had 10,000 for us, mere mortals. Those customers can be placed into a vector space based on each purchase - but how are each customer related to each other?

It is safe to assume that most companies are using keyword matching to relate to their customers.

For instance, if Johnny bought a brand new Grill and added on a cover, wheels, and a nice new propane hookup. When he searched for “Propane Grill” he was able to find what he was looking for. Based on the purchase of the grill, the seller was able to add on other items that previous purchasers bought within 30 days of the purchase.

A clever trick, but there could easily be a matching algorithm between the original keyword search of “Propane Grill” and “Cover” to now present Johnny with the proper cover for that grill. Easy Peasy.

But what if Johnny searches for a spatula next? Surely he would get the 1,000s of spatulas for indoor kitchens first because they are way more popular.

With semantic search, this is not an issue. While Johnny was checking out the product page of his brand new grill - all of the words that are semantically similar to “Propane Grill” were within proximity to his search - and therefore available to present to him during the checkout process. Semantic search could also return results for other items that are related to propane grills, such as grilling covers, grill tools, and grilling recipes.

Semantic search solves many of the intent problems Google currently has today. While marketers love the ability to reach the top of a page with a clever SEO hack by knowing the intent of the search - it doesn’t provide the best user experience.

Search “best dog breed” for example, and you’ll get lost in blogs containing a side of keyword salad. If you search "best dog breed," a semantic search engine would be able to understand that you are looking for information about the best dog breed for your specific needs. It could then return results for dog breeds that are a good fit for your lifestyle, activity level, and family situation. Below we’ll explore using semantic search on a checkout page.

How To Use Semantic Search On A Checkout Page

Users are very habitual and the keyword search-based internet has notoriously used user demographics to cluster users together. If a user enters the site and behaves like a cluster of similar users, then the algorithm will attempt to predict the purchase based on that cluster's previous purchases. However, those demographics are based on keywords and patterns - not intent.

Language models can condense this information into a single point, creating a semantic data model. The user is then represented as an embedding within a network of similar embeddings, connected by their similarities in vector form.

This means that as soon as Suzy tells the customer service representative that she’s going to Costa Rica and needs to return her recent purchase of Kitty Litter to Amazon, a new string of data can be added to her embedding that includes Costa Rica. Now when she searches Amazon for “coffee” the first hit can be “the best coffee to use in Costa Rica” based on a user review left 5 years ago - oh yeah, because user reviews can also be embedded into the same data model.

The user who left the review also reviewed a few hats that could be shown to Suzy as an add-on to her glasses purchase. Pretty cool right? This can go even deeper in selling high-ticket items.

Using Semantic Search as an Enterprise Company Selling High Ticket Items

The checkout page is simple to understand because we have all purchased something online, but are you ready to see why the 100 CEOs walked away with an action item to hire agencies like NineTwoThree to start building semantic searches?

Let’s use Equifax as an example.

They sell credit car reviews, loan reviews, car insurance reviews, and other B2B services across the United States. They also have B2C services - but for now, let’s explore B2B when companies are looking to see if the loan has a probability of default.

Currently, their database contains a massive amount of companies with files such as “loan qualification paperwork” or “credit check results.” While there are surely plenty of columns to rank all the businesses by credit score or loan qualification status, there are copious amounts of data that are not being used to grade each company (for the nerds out there, almost all methods use linear or logistic regression to predict the default score).

The process of building semantic comparisons between companies' loan applications would start with the data scientists parsing out all of the reasons the agents provided to those companies' of why their risk for default was acceptable or not.

Then the scientists would summarize the semantics of the reasoning and embed the results into the data model to compare that company against all the other companies in the model.

Lastly, if a business were to fail - all the textual reasoning given to the business could be related to all the similar businesses that have the same semantic language in their application to warn of impending default. Woah.

Just running this model over time would demonstrate patterns more frequently than the linear regression models credit default models employ today.

With semantic search, Equifax would be able to:

Identify the factors that are most important in predicting credit risk.
Develop more accurate credit scoring models.
Identify early warning signs of credit default.

It could also be used to parse out the reasons that agents provide for accepting or declining a loan application. This information could then be used to train a semantic search model to identify the most important factors in predicting credit risk. The model could then be used to develop more accurate credit scoring models and to identify early warning signs of credit default. As Equifax continues to use semantic search, the model will continuously get smarter with more outcomes.

Reward Training Neural Networks To Enhance Previous Users

The most exciting aspect of semantic search is that the system gets “smarter” with each new result - and the system does not have to be rewritten when a new embedding is added.

If Whole Foods hard coded the top five results for “Cheese” to buy in West Elm Virginia, then the introduction of the now popular CheeseWiz will have to be inserted into the list so that users searching “cheese” can find their newfound love. Even if the system was dynamically ranking the best cheese, each new cheese product would force the system to rerank the products per keyword.

Not with semantic search. When you are comparing the semantic distance between items any new product can be added to the product database and semantically compared to all the other products in the database. The vectors created cause relationships for the search to discover. There is no “retraining” or “re-ranking” necessary because the distance of the vector determines its search result.

Unlike traditional systems that hard code specific products or rankings based on keyword matches, semantic search creates semantic relationships between products that are constantly evolving and adapting to new data.

But what is even more magical? Once a triggering event happens - like the cheese is purchased, the vector can shorten its distance by a factor thereby informing the next user of the CheeseWiz that is being purchased more frequently. Then the next purchase does the same thing, and again and again, until the machine determines that everyone that searches for Cheese wants CheeseWiz.

With semantic search, the introduction of a new product is seamlessly integrated into the existing database, allowing the system to create new relationships and rankings based on the semantic distance between products.

Moreover, the system's unsupervised learning capabilities enable it to improve with every user interaction. Every purchase or search is a triggering event that informs the system about user preferences and increases the semantic similarity between related products. This process becomes a reward system that enhances the system's ability to provide accurate recommendations and search results.

The triggering event becomes the reward system to the model thus enhancing the similarities. This reward system is called unsupervised learning and provides the basis behind how ChatGPT continues to improve as we all become the test bunnies for AI.

Why are CEOs Scrambling to Find Data Scientists?

The world revolves around search. Currently, we match keywords to show the user what they intend to search for. But in the future, we will be matching intent. Companies willing to switch their business models from keyword-matching linear regression databases to semantic search will quickly see higher revenues.

Once users start using semantic search, they will have a more personalized and accurate search experience. This means that they will be more likely to find what they are looking for and make purchases. As a result, businesses that adopt this technology can benefit from increased customer satisfaction, higher conversion rates, and ultimately, higher revenues.

The shift from keyword-matching linear regression databases to semantic search represents a significant evolution in the field of search and recommendation systems. By focusing on matching intent instead of just keywords, businesses can create more intuitive and personalized experiences for their customers. This shift requires a new way of thinking about data, as well as a willingness to embrace new technologies.

As more and more businesses begin to adopt semantic search and unsupervised learning, the field of search and recommendation systems will continue to evolve.

This means that businesses that are quick to embrace these technologies will have a significant advantage over those that are slow to adapt. In the end, the businesses that are able to create the most personalized and accurate experiences for their customers will be the ones that succeed in the highly competitive world of e-commerce.

Download "Why Semantic Search Will Change the Internet" to Learn

The role of large language models (LLMs) like ChatGPT in the generative AI revolution
How semantic search goes beyond keyword matching to understand search intent
The importance of word embeddings in improving search relevance
The differences between keyword search and semantic search
Practical applications of semantic search in business, from customer personalization to content delivery

The ChatGPT Craze: Understanding the Hype and Its Limitations

Large Language Models – The Backbone of Generative AI

You can see more differences between the LLMs in the table below:

‍

LLMs can be used in the following ways:

Generating text: LLMs can be used to generate all sorts of text, including news articles, blog posts, poems, code, and scripts.
Translating languages: LLMs can be used to translate text from one language to another.
Writing creative content: LLMs can be used to write different kinds of creative content, like poems, code, scripts, music, email, and letters.
Answering questions in an informative way: LLMs can be used to answer your questions in a comprehensive and informative way, even if they are open-ended, challenging, or strange.
Analyzing and Summarizing Large Amounts of Unstructured Data: It’s often difficult for the average person to quickly read large amounts of data. LLMs can quickly summarize large amounts of data, allowing humans to focus more on strategy than comprehension.

What is Semantic Search

In addition to LLMs, Semantic search is also made possible by:

Knowledge bases: Knowledge bases are databases of information about entities (such as people, places, and things) and the relationships between them. Knowledge bases can be used to understand the context of search queries and to return results that are more relevant to the user's specific needs.
Machine learning algorithms: Machine learning algorithms are used to train LLMs and knowledge bases, and to generate search results.

Understanding Semantic Search

How can you possibly explain this to a computer?

Applying Semantic Distance To Semantic Search

For example, the vector for "cat" might have a high number for "small," "furry," and "meows," while the vector for "bird" might have a high number for "flies," "feathers," and "chirps."

To drive the point further this MIT paper uses this clever word math “Paris – France + Poland = Warsaw”

Why Semantic Search Will Change The Internet

But keywords have their limitations.

If I were searching for “beach glasses” the results could either be full glasses for liquids to use on the beach or sunglasses depending on how the company's search algorithm behaves.

Thus, keywords are controlled by the underlying algorithm.

Semantic search is a modern approach to information retrieval that leverages word embeddings to find semantically relevant documents or information from an unstructured knowledge base or database.

Keyword Search

Semantic Search

Semantic search is more tolerant to variations in word usage, synonyms, and even spelling mistakes, as the embeddings can capture semantic similarities despite these differences.

The approach is more robust in retrieving relevant results, as it considers the context and meaning of the words rather than just their presence or absence.

How Semantic Search Can Be Used In Enterprise

It is safe to assume that most companies are using keyword matching to relate to their customers.

But what if Johnny searches for a spatula next? Surely he would get the 1,000s of spatulas for indoor kitchens first because they are way more popular.

How To Use Semantic Search On A Checkout Page

The user who left the review also reviewed a few hats that could be shown to Suzy as an add-on to her glasses purchase. Pretty cool right? This can go even deeper in selling high-ticket items.

Using Semantic Search as an Enterprise Company Selling High Ticket Items

Let’s use Equifax as an example.

Then the scientists would summarize the semantics of the reasoning and embed the results into the data model to compare that company against all the other companies in the model.

Just running this model over time would demonstrate patterns more frequently than the linear regression models credit default models employ today.

With semantic search, Equifax would be able to:

Identify the factors that are most important in predicting credit risk.
Develop more accurate credit scoring models.
Identify early warning signs of credit default.

Reward Training Neural Networks To Enhance Previous Users

The most exciting aspect of semantic search is that the system gets “smarter” with each new result - and the system does not have to be rewritten when a new embedding is added.

Why are CEOs Scrambling to Find Data Scientists?

As more and more businesses begin to adopt semantic search and unsupervised learning, the field of search and recommendation systems will continue to evolve.

If you like this, download the full resource here.

Learn More

PDF This Page

View this Resource as a FlipBook For Free

Why Semantic Search Will Change the Internet

Download Now For Free

Have a Project?
‍Talk to the
Founders Directly

It's free, what do you have to lose?

Why Semantic Search Will Change the Internet

Download "Why Semantic Search Will Change the Internet" to Learn

The ChatGPT Craze: Understanding the Hype and Its Limitations

Large Language Models – The Backbone of Generative AI

What is Semantic Search

Understanding Semantic Search

Applying Semantic Distance To Semantic Search

Why Semantic Search Will Change The Internet

Keyword Search

Semantic Search

How Semantic Search Can Be Used In Enterprise

How To Use Semantic Search On A Checkout Page

Using Semantic Search as an Enterprise Company Selling High Ticket Items

Reward Training Neural Networks To Enhance Previous Users

Why are CEOs Scrambling to Find Data Scientists?

Download "Why Semantic Search Will Change the Internet" to Learn

The ChatGPT Craze: Understanding the Hype and Its Limitations

Large Language Models – The Backbone of Generative AI

What is Semantic Search

Understanding Semantic Search

Applying Semantic Distance To Semantic Search

Why Semantic Search Will Change The Internet

Keyword Search

Semantic Search

How Semantic Search Can Be Used In Enterprise

How To Use Semantic Search On A Checkout Page

Using Semantic Search as an Enterprise Company Selling High Ticket Items

Reward Training Neural Networks To Enhance Previous Users

Why are CEOs Scrambling to Find Data Scientists?

Have a Project?‍Talk to theFounders Directly

Have a Project?
‍Talk to the
Founders Directly