What are Models?

pink circles of semantic kernel

Tip

The article provides a brief overview of GPT models, including their variants, how they work, and how they can be fine-tuned. It also mentions similar LLM AI models and compares models based on their number of parameters.

👆Summary generated by semantic plugin SummarizeSkill.Summarize

A model refers to a specific instance or version of an LLM AI, such as GPT-3 or Codex, that has been trained and fine-tuned on a large corpus of text or code (in the case of the Codex model), and that can be accessed and used through an API or a platform. OpenAI and Azure OpenAI offer a variety of models that can be customized and controlled through parameters or options, and that can be applied and integrated to various domains and tasks.

About available OpenAI and Azure OpenAI GPT models

There are four Generative Pre-trained Transformer (GPT) models currently available from OpenAI and Azure OpenAI. They are composed of four variants: Ada, Babbage, Curie, and Davinci. They differ in the number of parameters, the amount of data they were trained on, and the types of tasks they can perform.

Ada is the smallest and simplest model, with 350 million parameters and 40GB of text data. It can handle basic natural language understanding and generation tasks, such as classification, sentiment analysis, summarization, and simple conversation.

Babbage is a larger model, with 3 billion parameters and 300GB of text data. It can handle more complex natural language tasks, such as reasoning, logic, arithmetic, and word analogy.

Curie is a very large model, with 13 billion parameters and 800GB of text data. It can handle advanced natural language tasks, such as text-to-speech, speech-to-text, translation, paraphrasing, and question answering.

Davinci is the largest and most powerful model, with 175 billion parameters and 45TB of text data. It can handle almost any natural language task, as well as some multimodal tasks, such as image captioning, style transfer, and visual reasoning. It can also generate coherent and creative texts on any topic, with a high level of fluency, consistency, and diversity.

Model Parameters Tasks
text-ada-001 350 million Basic NLU** and NLG**
text-babbage-001 3 billion Complex NLU and NLG
text-curie-001 13 billion Advanced NLU and NLG
text-davinci-003 175 billion Almost any NLU, NLG, and multimodal task

**Natural Language Understanding (NLU) / Natural Language Generating (NLG)

How does a GPT model work?

A GPT model is a type of neural network that uses the transformer architecture to learn from large amounts of text data. The model has two main components: an encoder and a decoder. The encoder processes the input text and converts it into a sequence of vectors, called embeddings, that represent the meaning and context of each word. The decoder generates the output text by predicting the next word in the sequence, based on the embeddings and the previous words. The model uses a technique called attention to focus on the most relevant parts of the input and output texts, and to capture long-range dependencies and relationships between words. The model is trained by using a large corpus of texts as both the input and the output, and by minimizing the difference between the predicted and the actual words. The model can then be fine-tuned or adapted to specific tasks or domains, by using smaller and more specialized datasets.

What is a baseline comparison rubric for LLM AIs?

LLM AI Model Parameters Year
BERT 340 million 2018
GPT-2 1.5 billion 2019
Meena 2.6 billion 2020
GPT-3 175 billion 2020
LaMDA 137 billion 2022
BLOOM 176 billion 2022

LLM AI models are generally compared by the number of parameters — where bigger is usually better. The number of parameters is a measure of the size and the complexity of the model. The more parameters a model has, the more data it can process, learn from, and generate. However, having more parameters also means having more computational and memory resources, and more potential for overfitting or underfitting the data. Parameters are learned or updated during the training process, by using an optimization algorithm that tries to minimize the error or the loss between the predicted and the actual outputs. By adjusting the parameters, the model can improve its performance and accuracy on the given task or domain.

Easily test different models using Semantic Kernel tools

If you want to easily test how different models perform, you can use the Semantic Kernel VS Code Extension to quickly run a prompt on AI models from OpenAI, Azure OpenAI, and even Hugging Face.

Switching models in the Semantic Kernel VS Code Extension

Take the next step