Guide to working with Large Language Models

This guide aims to provide users with comprehensive resources and learnings for Large Language Model (LLM) applications covering the following topics:

Introduction:

Overview of LLMs

Large Language Models (LLMs) are deep learning models trained using large text corpora to generate text. They are based on the idea of auto-regressive models, where they have been trained to predict the next word (or the most probable ones) given the previous ones. LLMs can be used to process large amounts of text and learn the structure and syntax of human language.

LLM model

The development of LLMs has been a gradual process, especially over the past decade. The first LLMs were relatively small and could only perform simple language tasks. However, with the advances in deep neural networks, larger and more powerful LLMs were created. The figure below shows examples of LLMs that have been introduced over the past few years with their sizes.

LLM trends

The 2020 release of the GPT-3 (Generative Pre-trained Transformer 3) model marked a significant milestone in the development of LLMs. GPT-3 also represents a family of models of different sizes. GPT3 demonstrated the ability to generate coherent and convincing text that was difficult to distinguish from text written by humans.

Like other foundation models, LLMs such as GPT-3 are trained in a self-supervised/unsupervised manner and can be adapted to perform different tasks and/or applications. This training process is in contrast to traditional machine learning models where different models are trained for different tasks using labeled data. For example, LLMs can be used to generate text for chat-bots, language translation, and content creation. They can also be used to analyze and summarize large amounts of text data, such as news articles or social media posts. In addition, LLMs can be used to write programming code from natural language requests.

Adapting the LLMs for different tasks or applications can be done in two ways: in-context learning through prompt engineering or fine-tuning. Regardless of the approach to be used, developers and data scientists should learn about and adopt new techniques.