Hello @Andy Matrix
Thanks for reaching out to us. To answer your question generally from Microsoft side, Large Language Models (LLMs) are machine learning models that use Natural Language Processing (NLP) to both process text inputs and generate text outputs. They are a form of neural network that makes use of large collections of parameters applied to training data to answer questions, translate texts, generate application code, and other useful tasks.
The base foundation models are pre-trained models that are used as a starting point for training a custom language model. These models are trained on a large corpus of text data and can be fine-tuned to perform specific tasks.
Early examples of foundation models were pre-trained large language models including Google's BERT and various early GPT foundation models, which notably includes OpenAI's "GPT-n" series. Such broad models can in turn be used for task and/or domain specific models using targeted datasets of various kinds, such as medical codes.
Beyond text, several visual and multimodal foundation models have been produced—including DALL-E, Flamingo, Florence and NOOR. Visual foundation models (VFMs) have been combined with text-based LLMs to develop sophisticated task-specific models. There is also Segment Anything by Meta AI for general image segmentation. GATO by Google DeepMind for reinforcement learning agents.
Once the base foundation model is trained, it can be fine-tuned for specific tasks by providing it with labeled data. The model is then trained on this labeled data to learn how to perform the specific task. Fine-tuning a model on a specific task can significantly improve its performance on that task.
Wiki page's explanation is more detailed for your reference -
A large language model (LLM) is a language model characterized by emergent properties enabled by its large size. They are built with artificial neural networks, (pre-)trained using self-supervised learning and semi-supervised learning, typically containing tens of millions to billions of weights. They are trained using specialized AI accelerator hardware to parallel process vast amounts of text data, mostly scraped from the Internet. As language models, they work by taking an input text and repeatedly predicting the next token or word. The 2017 invention of the transformer architecture drove a series of breakthroughs in LLM development. Older, specialized supervised models for specific linguistic tasks, have been made largely obsolete by the emergent abilities of LLMs,[4] which are thought to acquire embodied knowledge about syntax, semantics and "ontology" inherent in human language corpora, but also inaccuracies and biases present in the corpora. Notable LLMs include GPT-4, LLaMa, PaLM, BLOOM, Ernie 3.0 Titan, and Claude.
For more details about how to train a LLM and more thesis information, please see the page -
https://en.wikipedia.org/wiki/Large_language_model#Lead-up_to_transformer-based_models
I hope this helps! Let me know if you have any more questions.
Regards,
Yutong
-Please kindly accept the answer and vote 'Yes' if you feel helpful to support the community, thanks a lot.