Κοινή χρήση μέσω


Deploy generative AI agents and models

This article describes Mosaic AI Model Serving support for deploying generative AI agents and models for your generative AI applications.

What are generative AI agents?

Generative AI agents are compound AI systems that depend on large language models and user input to determine which steps to take to perform a task. See Create and log AI agents.

What are generative AI models?

Generative AI models create new content from inputs like text, images, and code. These models are trained on large datasets and use deep learning to identify patterns and structures in existing data, and then generate new content based on what they’ve learned.

Foundation models are a type of generative AI model. These models are pre-trained with the intention that they are to be fine-tuned for more specific language understanding and generation tasks.

Deploy a generative AI agent

Databricks supports two methods for deploying a generative AI agent:

During development, use the deploy() method in the Mosaic AI Agent Framework. This method automatically creates:

  • A CPU endpoint for deployment and testing.
  • A URL to the Agent Evaluation review app where stakeholders can interact with the agent to test output and record feedback.

For production applications, use Mosaic AI Model Serving to create your own CPU endpoint to deploy your agent.

For more details on these options, see Deploy an agent for generative AI application.

Deploy a generative AI model

Mosaic AI Model Serving supports serving and querying generative AI models using the following capabilities:

  • Foundation Model APIs. This functionality makes state-of-the-art open models and fine-tuned model variants available to your model serving endpoint. These models are curated foundation model architectures that support optimized inference. Base models, like DBRX Instruct, Llama-2-70B-chat, BGE-Large, and Mistral-7B are available for immediate use with pay-per-token pricing, and workloads that require performance guarantees, like fine-tuned model variants, can be deployed with provisioned throughput.
  • External models. These are generative AI models that are hosted outside of Databricks. Endpoints that serve external models can be centrally governed and customers can establish rate limits and access control for them. Examples include foundation models like OpenAI’s GPT-4, Anthropic’s Claude, and others.

For a getting started tutorial on how to query a foundation model on Databricks, see Get started querying LLMs on Databricks.

Create a generative AI model serving endpoint

See Create generative AI model serving endpoints.

Query your deployed generative AI model