Search's Future & Sequoia CEO Meet - Part 1

Published on

May 29, 2023

Updated on

June 24, 2024

Search's Future & Sequoia CEO Meet - Part 1

Discover the powerful impact of Generative AI on search as you dive into the world of vector embeddings and their role in reshaping search engines.

Hello, my name is Andrew Amann and am one of the founders for Caveminds.ai. I run a Product Engineering and Design Studio called NineTwoThree where we build web, mobile and AI apps. Our team builds ML / AI models for enterprise companies and through osmosis and conversations understand the applications that are going to make a larger difference than ChatGPT. Subscribe to get regular insights.

You might be familiar with the first wave of AI - you use it daily when you search the internet. But what if I told you the sequel is even better? Think of it like a movie franchise: the first one's a classic, but the second one's got more action, drama, and maybe even some comic relief (just like this blog 😉).

So buckle up and get ready to explore this new frontier, as we dive into the world of Generative AI and discover how it's reshaping search in ways that'll make you say, "Wow, I didn't see that coming!" 🤯🚀

Google Search, Bing Search, Yahoo and even AskJeeves, is the first wave of AI which resulted in the ranking of information. “Best Hair Dryers”, “Top Restaurants in Chicago”, “How to remove a nickel from my ear” and “top project management tools” are simply fed into the search machine by a human and returned with thousands of pages, ranked in order from best to worst.

But who (or what 🤖) chooses what appears 1st, or 2nd or 32nd? You see, that was the trillion dollar question that Google locks down in a vault like the like Coca-Cola locks its sweet-sparkling-nectar formula in a vault. 🥤

Google restructured the incentives of Politicians, journalists, entertainers and every other industry trying to compete for eyeballs. 👁️👁️. But as goes with everything on the internet - Marketers. Ruin. Everything. Every business has hired marketers to discover ways to reverse engineer Google and AskJeeves (I guess) creating the infamous SEO category that places their beloved content at the top of the list.

‍

The race to place in the top spot on Google results in millions of dollars for certain search terms. And it is all about to change…

But before we dive into the second wave 🌊of AI let’s get a grasp on something a bit more scientific so that we can understand a very important concept that is being tossed around.

Intent

What is intent? Well, it's the semantic relevance of something typed according to what was expected. It is the action, goal, or response that the searcher wants to see when they ask a question or assign a task. Google is very good at intent because it uses vector embeddings –we'll see what they are in a few lines – just like every other rank based system has used since the 00’s.

Facebook’s Newsfeed, Spotify, Netflix, Google Search and all other systems intending to show relevant content will eventually have enough money and data to create a vector database to enhance the relevancy which will keep users on their platform longer.

‍

Netflix's only true competition is sleep because after you watched your 5th episode of “The O.C” then suddenly… – “But wait, here are 13 other shows from the 90’s that you might also like to watch at 3 a.m.”

But how exactly does science work (and do I need to learn this to understand generative AI). "But before you start [grumbling] around about why you should pay attention to this, let me explain to you why this is important - it's called vector embeddings.

So let's quickly nerd out 🤓

Imagine you have a bunch of words like "dog", "cat", and "bird". These words are just text, but if you want to teach a computer to understand what they mean, you need to turn them into something the computer can work with, like numbers.

That's where vector embeddings come in. A vector is just a list of numbers, and an embedding is a way of taking a word and turning it into a vector.

But how do you make sure that the vector for "dog" is different from the vector for "cat" or "bird"? That's where the magic happens! The computer uses a lot of data to figure out the best way to create these vectors so that they represent each word's meaning.

Once you have these vectors, you can use them for all kinds of things, like finding similar words or even understanding sentences. It's kind of like giving the computer a way to understand words in the same way that we do!

The famous example: King - Man + Women = Queen

When you consider search with vector embeddings, know that it is why we can ask a question that does not contain the keyword “What is the happiest dog breed?” that will return controversially “Labrador Retriever.” Come on, Google, have you ever met a Husky?

As a Large Language Model, ChatGPT is designed specifically to generate text that sounds like it was written by a human. To do this, it needs to understand the relationships between different words and phrases, and it needs to be able to use that understanding to generate coherent and natural-sounding responses.

Vector embeddings help ChatGPT achieve this by providing a way to represent words as numbers that capture their meaning. When ChatGPT generates text, it can use these vector embeddings to find words and phrases that are related (most of the time) to the context of the conversation and use them to create a response that makes sense.

Alright, alright, my kings and “king - man + women” aka queens. Now that we've laid the foundation, why is this so powerful for Google Search?

Because the biggest achievement in the last year is not that ChatGPT can summarize your meeting notes into fancy poems in Shakespeare style, it is that ANY text, video, image or piece of content can be vectorized and then semantically compared to other vectors in space.

Basically and entire blog can be summarized to a vector that has a semantic meaning that the model can reference to other blogs of the same semantic meaning.

Yes - that is the giant breakthrough that Darmesh raved about for 15 minutes on MFM. It is also what the most popular conversation was about at the 100 CEOs secret meeting in Sequoia Capitals HQ.

Are you catching on yet? Why does this matter?

Imagine you are the largest insurance provider in the United States and your entire business model depended on understanding who in your customer database is going to upgrade their insurance? Who might be buying a new house? Who might be close to dying? And who might cancel in the next 30 days? I mean, these questions are some of the highest mathematical professional paying jobs on the planet – and for good reason.

But now, every conversation that the customer has can be summarized by the LLM to the actual meaning of that content and embedded to their customer profile as a vector. This can also include all of the phone calls that were “recording for training purposes” and emails that were sent “Confidentially between the client and insurance agent.”

Every. Piece. Of. Content. Can. Be. Ingested. And. Summarized.

Whoa. Now what do you do with that?

Think of the keyword based insurance platform’s database that has 100 columns with 1000 customers. Each column contains a datapoint of that customer. Age, weight, spouse, address, email, car model, has a pet etc. Now add thousands of more columns that include “tone of voice on phone” or “life plans” or “meaning of recent blog read on website.” That is what vector embeddings can do.

‍

What starts to unfold is a really particular customer that has a lot more specificity to their profile. Then when that person upgrades their insurance plan you can infer 100’s or more customers that have had similar behaviors or patterns and assume that they too might be “thinking” to upgrade - just don’t know how. That alone is a billion dollar change to just 1 company, and we know because they called us to see what is possible.

And it gets better with Search. 🔍

From text based input the searcher will request an answer to their question or ask for a task to be completed, the results are going to be conversational - not rank based. Therefore, the entire search industry could adapt to conversations at the search level - not the blog level. And the results will be typed up by the bot, one word at a time, based on the content that closely resembles the research it completed across the internet related to the searcher's question.

If you asked the chatbot “How do I cook the best lasagna?” the LLM will find the ONE blog that has the most authority and highest number of reviews on the entire web to return its answer (or it may combine 100’s or recipes to create its own concoction). But regardless, it's still a winner-takes-all scenario. While that seems unfair to all the other lasagna recipes, and Garfield, it does allow the best experience for the customer - because after all - good lasagna is hard to find.

‍

But what about opinionated topics like “top 10 vacuum cleaners?” Surely there are different vacuums for different people. Budget matters, and so do questions like “do you have a pet?” What Ebay has done, just like Amazon and every other rank based search system, is apply 99 filters, but a LLM ain't one.

‍

So the user clicks away at “price” and “handheld” and “robotic” until they are left with the perfect vacuum for their home.

‍

But in a conversation the feeling is different. The chatbot should ask a few questions about the user first. “Hello, Jim, I see you want to buy a vacuum. Tell me, do you have any pets?” “No, ok, what is the maximum price you intend to purchase a vacuum for?” “Ok great, do you want one with a battery or plug?” “Great, here are three you might like” And directly send you to the manufacturer website to purchase.

Whoa - that seems better for the end user - but horrible for all those SEO marketers that try and win the top spot for “best vacuum.”

How are affiliate links going to work? How are ads going to work on blogs and listicals, and long-form content, and guides or blogsicals ahh!

There Are Really Only Two Outcomes Of ChatBot Search

The first, is that the Chatbot starts to create references to the content it pulls from. This demonstrates that the content has authority and the writer can gain some sort of reward for its effort in the form of a backlink or page visit. However, authority still remains hard to dethrone - meaning that the winner-take-all concept moves from the search engine to the one blog that owns the top spot.

This is what happens with Zero click searches right now - and presumably going to be used in Bard to save processing costs on the LLM. If Google has already verified the results from a trusted source - why spend processing power to dig deeper?

In the previous rank based system, at least the user had the ability to choose the link to select. This allowed both sides of the argument to rank and depending on the intent of the user, it was ultimately the user who decided what blog to read.

But with chatbots, the conversation only has one result (for now).

The other solution is that brands start to appear INSIDE the chatbot. That is right, your Geico Gecko will take over from Bard and start asking you questions about your car insurance policy. While farfetched for now, let me remind you that Marketers. Ruin. Everything.

‍

To successfully do this, enterprise brands are going to have to create their own LLM that has all of the knowledge from their brand embedded into the chatbot to provide accurate information. Just as importantly, it will also have to include a filter to ensure that the chatbot does not spew misinformation on their behalf.

‍
We had a large beer manufacturer ask us if they could use ChatGPT but without referring to the negative side of alcohol. You know, just in case their beloved users were curious…

What brands need to build, (with the help of agencies of course) is what I call a positive and negative database. Positive information to feed the bot, and a “no-no” list to provide to the filter to ensure the results are “on-brand”.

** This is Important

If you read nothing else, or have gotten this far with your jaw on the floor, understand this. ChatGPT apps will become extinct. Jasper is slowly moving towards extinction after a $125M investment at a $1.5B valuation. That was just 6 months go → Extinction.

What does Copy.AI do that ChatGPT cannot right now? You see where this is going?

If you are building an AI product you need to build for the future revenues of your clients. Anything built in a month will not produce profit in a month - lindy effect! But you are here to learn what you SHOULD be building.

And what should you be building? You should be learning how prompting can manipulate the LLM. You should be learning how to control the inputs and outputs of the LLM. Alexa is doing this right now - and they said it will take YEARS and billions of dollars to accomplish.

‍

But, they will be 100% sure that Alexa will not tell you to jump out your window or how to build a nuclear bomb - and that is important to a company that can be sued.

So what is enterprise doing with LLM?

They are learning. They are prompting. They are manipulating. And they are watching others build. And so should you. By understanding how to control the LLM - you can understand what is really important for a company - the outputs.

Fin Important Section

Back to the story…

Whether the brand gets pulled into the chatbot and handed off when prompted, OR the chatbot sends the user to the brand's website - Chatbots are the new search engines. And we all need to get onboard with being more human with search engines. (Even Ebay and their 8 million filters.)

But if the first option holds true, content providers will have to become the authority in their industry to provide content to the search engine. No longer is it going to be important to know a fact, you will have to list out all of the facts to carry on the conversation - much like humans do. The more your website content is being pulled by the AI, the more “trust” the chatbot will have in your content around that topic.

And this is why vector embeddings are so important. It's the blinkist to audible, the cliff notes to high-school, the Tik-Tok to YouTube. Everything that's written can be summarized into one idea. Then, when the idea matches the result of the search term… voilá The connection is made like sticks and fire.

But I have a wild theory to add to the future of search engines.

What if the content providers were paid pennies for their facts? How cool would it be that everytime the AI chatbot pulled your content to be used inside its search engine you were rewarded a few pennies for your efforts. Is this profitable?

Actually yes! It costs more than a few pennies to search all of the vector embeddings that have semantic similarities to your intent. Therefore, the less the machine has to work, the less it will cost - and therefore if it starts to gain trust pods with certain sites - it can skip scrolling the rest of the internet.

Don’t you dare even think about blockchain - don't mention it! This is a simple award system similar to backlinks (they don’t use blockchain fyi). Marketers have learned how to monetize backlinks and while it doesn’t convert directly to one abraham lincoln, it does hold value. And that is what matters.

If AI ChatBots begin to reward the trusted authorities to help a conversation get resolved - then the content provider will be rewarded for doing so. This means that the content has to be complete and not singular.

The content will work as a team with the processing of the vector database. The more content the LLM has to navigate through the longer it will take to respond and the more it will cost. Therefore, these shortcuts of trusted articles could be just the handoff LLM’s are looking for to include everyone in their party.

Important Takeaway

If you write for SEO rankings, start becoming an authority on the full conversation of your vertical. Intent means that your entire site will have a reason for existing and it better be to help someone find information, purchase a product, or learn something new. This means you yourself have to be the authority from experience too.

Understanding the transformational change is not that ChatGPT can predict words, but that entire papers, articles, blogs and videos can be summarized and compared semantically to other summarized content.

Understanding what is “Intent” and how LLM’s are using Intent to find the words to answer searchers queries is a shift from keyword intent. Every paragraph now matters towards the information you are trying to convey. “Fluff” “Jargon” and other sentences to make your word count 500 words or more will just confuse the AI and reduce your authority. In the end, search intent will always matter - but instead of matching a keyword you need to match the conversation.

Secondly, learn how to manipulate the LLM and control the output. Big business is being cautious because they are ensuring the experience for THEIR customers matches their values - and doesn't get them sued. But this is where the money is. The more understanding a small business, big business or Martin Shkreli has on controlling the outputs, the more in demand your product or service will have for other users.

While we are all playing with the cool tools, remember that search engines drive revenue to your business one way or another - and we have to start preparing for the day when keywords no longer matter as much as conversations do. This second wave of AI is just beginning to take shape.

Are you ready?