A common phrase when using ChatGPT is that it feels like “Magic”. But has anyone explained to you this magic?
The creators of ChatGPT have used both Supervised Learning and Reinforcement learning to fine-tune the model - but only when Human Feedback was provided did we begin to witness the bunny-out-of-the-hat magic we feel when we use it.
Let’s take a journey together through the layering system of ChatGPT and how OpenAI discovered that autocorrect is possible, sort of going to become Artificial General Intelligence.
There are about 10 Billion coherent blogs on the internet and just over 5 million books in English that create an enormous amount of text for someone- or something to read. Since the birth of AI, scientists have tried to understand the extreme complexity of translating language from say English to French for many years. Not only do you need to know which words are swappable - but what the actual context of the entire sentence is - then recreate it in the opposing language.
While it is vastly understood that ChatGPT is trained on massive amounts of sentences it often is labeled as a clever “autocorrect” machine in attempts to simplify the magic it hides. While the
core of the mockery is true, ChatGPT is just an ‘autocorrect machine’, there is more to the genius behind its creation. Let’s explore the laying of how ChatGPT came to be.
The first challenge that any LLM has is predicting the context of the sentence.
Many scientists created specific algorithms to solve specific sentence structures and expected to later compile the algorithms into a massive network. Any model's capability is evaluated by the mathematical expression that determines the accuracy of the challenge. If you know the English sentence and the perfect French translation, how close does the machine get to that exact translation?
But solving for every grammatical sentence structure in both French and English is a massive mathematical undertaking. What if there was a better way?
This is where Recurrent Neural Networks come into play…
A Recurrent Neural Network (RNN) is a type of computer program that is really good at understanding and analyzing sequences of things. Sequences can be anything that comes in order, like words in a sentence, notes in a song, or frames in a video.
RNNs have been around since the 1980s. Some examples of RNNs can be found in Google Translate to translate text from one language to another. Voice assistants like Siri use RNNs to process speech signals and understand human language. RNNs are also used for image captioning, where they generate textual descriptions of images by processing visual input and generating a coherent sentence.
But there was a problem - when the RNN tried to learn through reinforcement training, it sometimes had trouble remembering things from a long time ago in the sentence called the vanishing gradient problem. This made it hard for the program to learn how to talk like a human.
What the founders of ChatGPT discovered was a new way to help the RNN remember things better called Long Short-Term Memory (LSTM). This made the program much better at learning how to talk like a human. Even up until 2018, this method was deemed impossible by some industry experts like Alex Irpan. Deep Reinforcement Learning doesn't work yet
At its basic level, LSTM works by having a special structure that allows it to decide what information to remember and what to forget. It's like if you had a notebook where you could write down important information and then cross out things that weren't important anymore.
The embedding layer is like a map that helps the computer understand the meaning of words. It takes words in a sentence and turns them into numbers that the computer can understand better.
For example, the word "cat" might be turned into numbers [0.5, 0.2, 0.8], and the word "dog" might be turned into numbers [0.3, 0.7, 0.1]. These numbers represent the meaning of the words in a way that the computer can work with.
The embedding layer is really useful because it helps the computer learn how words are related to each other, which is important for understanding language.
To train a language model like an LLM, text data needs to be converted into a numerical format that the model can understand. One method for doing this is called vectorization, which involves representing each word in the text as a vector of numbers.
When vectorizing text data for LLMs, a vocabulary of all the unique words in the text data is created. Each word in the vocabulary is then assigned a unique index, and a lookup table is created to map each word to its corresponding index.
Once the lookup table is established, each sentence or sequence of words can be converted into a sequence of numbers that the LLM can process. One common method for doing this is called one-hot encoding, which involves creating a vector of all zeros with a length equal to the vocabulary size and setting the index corresponding to each word in the sequence to 1.
Once the text data has been converted into a numerical format, it can be fed into the LLM for training or inference. During this process, the LLM learns to recognize patterns and relationships between the different vectors, enabling it to generate coherent and contextually relevant responses to new inputs.
The layer that was added to start semantically comparing vectors and finding similarities in content was the attention mechanism.
The attention mechanism is a component added to RNNs and LSTMs that allows the model to selectively focus on certain parts of the input sequence when generating output. It works by assigning weights to different parts of the input sequence, allowing the model to attend to the most relevant information.
This mechanism is particularly useful for natural language processing tasks, where certain words or phrases in a sentence may be more important than others for generating a coherent response. By selectively attending to these key components, the model is able to generate more meaningful and contextually relevant output.
The attention mechanism has been widely adopted in natural language processing tasks, including machine translation, question answering, and summarization, and has contributed to significant improvements in performance.
When an input is given to ChatGPT, the model processes the input through its embedding layer, which maps the words in the input to dense vector representations. These vectors are then fed into the RNN and LSTM layers of the model, which are designed to maintain a memory of previous inputs and selectively attend to different parts of the input.
The RNN and LSTM layers work together to process the input sequence and extract relevant information, such as the meaning of words and the structure of the sentence. The output from these layers is then passed through the attention mechanism, which selectively focuses on important parts of the input and generates a context vector that summarizes the most relevant information.
The vast knowledge database that ChatGPT pulls from includes a wide range of sources and domains, including books, articles, websites, and other written works. During the training process, the model learns to recognize patterns and relationships between words and phrases in this text data, allowing it to generate coherent and contextually relevant responses to new inputs.
Overall, the RNN and LSTM layers are critical components of ChatGPT, as they enable the model to maintain a memory of previous inputs and selectively attend to different parts of the input sequence. This, combined with the vast knowledge database that ChatGPT pulls from, allows the model to generate complex and nuanced responses to a wide range of prompts and questions.
After all the above layers compute a sentence the result is a racist, non-helpful, violent ChatGPT-1. Back to the drawing board - or the introduction of humans.
Ilya, famously laid out that by supplying a relatively small set of humans to give a thumbs up or thumbs down to the outputs from the model, the reinforcement would eventually reach acceptable levels across parallel aspects making the machine self-learning.
The developers of the system would create a pre-trained base model based on the information provided by their labelers. These 12-15k data points would mark the correct answers for the prompt asked and provide similar prompts with guidance based on the massive knowledge base.
However, once you achieve relatively acceptable responses, the customers can become the labels and the reward system can be introduced to the model.
Note, the Supervised Fine Tuned Model (SFT) can only be trained once - hence ChatGPT 3.5, Text-Davinci-003, etc.
The computer must learn what is good vs bad - or correct vs incorrect.
Feeding rewards and actions into an RNN is like giving a cookie to a dog when it does a trick. In this case, the RNN is like a really smart dog that's learning how to make decisions.
When the RNN makes a decision, it gets feedback in the form of a reward, which is like getting a cookie. The RNN learns to associate certain actions with higher rewards, just like a dog learns that doing certain tricks gets it more cookies.
The RNN can also use information about past rewards and actions to make better decisions in the future. It's as if the dog remembered which tricks got it the most cookies and started doing those more often.
Overall, feeding rewards and actions into an RNN helps it learn how to make better decisions, just like giving a dog treats helps it learn new tricks.
In conclusion, the magic behind ChatGPT lies in its layering system, which uses recurrent neural networks (RNNs) and long short-term memory (LSTM) to improve the model's ability to understand human language.
The embedding layer converts words into numbers to help the computer understand their meaning, while the attention mechanism allows the model to compare vectors and find similarities in content. The combination of supervised and reinforcement learning fine-tunes the model, but it is human feedback that has truly made ChatGPT an extraordinary tool that feels like magic. ChatGPT' s development shows the vast potential of artificial intelligence and how it can be used to create new, innovative solutions to complex problems.