This article describes how to add observability to your generative AI applications with MLflow Tracing on Databricks.
What is MLflow Tracing?
MLflow Tracing provides end-to-end observability for GenAI applications from development to deployment. Tracing is fully integrated with Databricks's GenAI toolset, capturing detailed insights across the entire development and production lifecycle.
The following are the key use cases for tracing in GenAI applications:
Streamlined debugging: Tracing provides visibility into each step of your GenAI application, making diagnosing and resolving issues easier.
Offline evaluation: Tracing generates valuable data for agent evaluation, allowing you to measure and improve the quality of agents over time.
Production monitoring: Tracing provides visibility into agent behavior and detailed execution steps, enabling you to monitor and optimize agent performance in production.
Audit logs: MLflow Tracing generates comprehensive audit logs of agent actions and decisions. This is vital for ensuring compliance and supporting debugging when unexpected issues arise.
Requirements
MLflow Tracing is available on MLflow versions 2.13.0 and above. Databricks recommends installing the latest version of MLflow to access the latest features and improvements.
Bash
%pip install mlflow>=2.13.0 -qqqU
%restart_python
Automatic tracing
MLflow autologging lets you quickly instrument your agent by adding a single line to your code, mlflow.<library>.autolog().
MLflow supports autologging for most popular agent authoring libraries. For more information about each authoring library, see MLflow autologging documentation:
Library
Autologging version support
Autologging command
LangChain
0.1.0 ~ Latest
mlflow.langchain.autolog()
Langgraph
0.1.1 ~ Latest
mlflow.langgraph.autolog()
OpenAI
1.0.0 ~ Latest
mlflow.openai.autolog()
LlamaIndex
0.10.44 ~ Latest
mlflow.llamaindex.autolog()
DSPy
2.5.17 ~ Latest
mlflow.dspy.autolog()
Amazon Bedrock
1.33.0 ~ Latest (boto3)
mlflow.bedrock.autolog()
Anthropic
0.30.0 ~ Latest
mlflow.anthropic.autolog()
AutoGen
0.2.36 ~ 0.2.40
mlflow.autogen.autolog()
Google Gemini
1.0.0 ~ Latest
mlflow.gemini.autolog()
CrewAI
0.80.0 ~ Latest
mlflow.crewai.autolog()
LiteLLM
1.52.9 ~ Latest
mlflow.litellm.autolog()
Groq
0.13.0 ~ Latest
mlflow.groq.autolog()
Mistral
1.0.0 ~ Latest
mlflow.mistral.autolog()
Disable autologging
Autologging tracing is enabled by default in Databricks Runtime 15.4 ML and above for the following libraries:
LangChain
Langgraph
OpenAI
LlamaIndex
To disable autologging tracing for these libraries, run the following command in a notebook:
Python
`mlflow.<library>.autolog(log_traces=False)`
Add traces manually
While autologging provides a convenient way to instrument agents, you may want to instrument your agent more granularly or add additional traces that autologging doesn't capture. In these cases, use MLflow Tracing APIs to manually add traces.
MLflow Tracing APIs are low-code APIs for adding traces without worrying about managing the tree structure of the trace. MLflow determines the appropriate parent-child span relationships automatically using the Python stack.
Combine autologging and manual tracing
Manual tracing APIs can be used with autologging. MLflow combines the spans created by autologging and manual tracing to create a complete trace of your agent execution. For an example of combining autologging and manual tracing, see Instrumenting a tool calling agent with MLflow Tracing.
Trace functions using the @mlflow.trace decorator
The simplest way to manually instrument your code is to decorate a function with the @mlflow.trace decorator. The MLflow trace decorator creates a "span" with the scope of the decorated function, which represents a unit of execution in a trace and is displayed as a single row in the trace visualization. The span captures the input and output of the function, latency, and any exceptions raised from the function.
For example, the following code creates a span named my_function that captures input arguments x and y and the output.
Python
import mlflow
@mlflow.tracedefadd(x: int, y: int) -> int:return x + y
You can also customize the span name, span type, and add custom attributes to the span:
Python
from mlflow.entities import SpanType
@mlflow.trace(# By default, the function name is used as the span name. You can override it with the `name` parameter.
name="my_add_function",
# Specify the span type using the `span_type` parameter.
span_type=SpanType.TOOL,
# Add custom attributes to the span using the `attributes` parameter. By default, MLflow only captures input and output.
attributes={"key": "value"}
)
defadd(x: int, y: int) -> int:return x + y
Trace arbitrary code blocks using context manager
To create a span for an arbitrary block of code, not just a function, use mlflow.start_span() as a context manager that wraps the code block. The span starts when the context is entered and ends when the context is exited. The span input and outputs should be provided manually using setter methods of the span object yielded by the context manager. For more information, see MLflow documentation - context handler.
Python
with mlflow.start_span(name="my_span") as span:
span.set_inputs({"x": x, "y": y})
result = x + y
span.set_outputs(result)
span.set_attribute("key", "value")
Tracing example: Combine autologging and manual traces
The following example combines OpenAI autologging and manual tracing to fully instrument a tool-calling agent.
Python
import json
from openai import OpenAI
import mlflow
from mlflow.entities import SpanType
client = OpenAI()
# Enable OpenAI autologging to capture LLM API calls# (*Not necessary if you are using the Databricks Runtime 15.4 ML and above, where OpenAI autologging is enabled by default)
mlflow.openai.autolog()
# Define the tool function. Decorate it with `@mlflow.trace` to create a span for its execution.@mlflow.trace(span_type=SpanType.TOOL)defget_weather(city: str) -> str:if city == "Tokyo":
return"sunny"elif city == "Paris":
return"rainy"return"unknown"
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
},
},
}
]
_tool_functions = {"get_weather": get_weather}
# Define a simple tool-calling agent@mlflow.trace(span_type=SpanType.AGENT)defrun_tool_agent(question: str):
messages = [{"role": "user", "content": question}]
# Invoke the model with the given question and available tools
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=tools,
)
ai_msg = response.choices[0].message
messages.append(ai_msg)
# If the model requests tool calls, invoke the function(s) with the specified argumentsif tool_calls := ai_msg.tool_calls:
for tool_call in tool_calls:
function_name = tool_call.function.name
if tool_func := _tool_functions.get(function_name):
args = json.loads(tool_call.function.arguments)
tool_result = tool_func(**args)
else:
raise RuntimeError("An invalid tool is returned from the assistant!")
messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": tool_result,
}
)
# Send the tool results to the model and get a new response
response = client.chat.completions.create(
model="gpt-4o-mini", messages=messages
)
return response.choices[0].message.content
# Run the tool calling agent
question = "What's the weather like in Paris today?"
answer = run_tool_agent(question)
Annotate traces with tags
MLflow trace tags are key-value pairs that allow you to add custom metadata to traces, such as a conversation ID, a user ID, Git commit hash, etc. Tags are displayed in the MLflow UI to filter and search traces.
Tags can be set to an ongoing or completed trace using MLflow APIs or the MLflow UI. The following example demonstrates adding a tag to an ongoing trace
using the mlflow.update_current_trace() API.
Python
@mlflow.tracedefmy_func(x):
mlflow.update_current_trace(tags={"fruit": "apple"})
return x + 1
To review traces after running the agent, use one of the following options:
In-line visualization: In Databricks notebooks, traces are rendered inline in the cell output.
MLflow experiment: In Databricks, go to Experiments > Select an experiment > Traces to view and search through all the traces for an experiment.
MLflow run: When the agent runs under an active MLflow Run, traces appear on the Run page of the MLflow UI.
Agent Evaluation UI: In Mosaic AI Agent Evaluation, you can review traces for each agent execution by clicking See detailed trace view in the evaluation result.
Trace Search API: To programmatically retrieve traces, use the Trace Search API.
Evaluate agents using traces
Trace data serves as a valuable resource for evaluating your agents. By capturing detailed information about the execution of your models, MLflow Tracing is instrumental in offline evaluation. You can use the trace data to evaluate your agent's performance against a golden dataset, identify issues, and improve your agent's performance.
import mlflow
# Get the recent 50 successful traces from the experiment
traces = mlflow.search_traces(
max_results=50,
filter_string="status = 'OK'",
)
traces.drop_duplicates("request", inplace=True) # Drop duplicate requests.
traces["trace"] = traces["trace"].apply(lambda x: x.to_json()) # Convert the trace to JSON format.# Evaluate the agent with the trace data
mlflow.evaluate(data=traces, model_type="databricks-agent")
After an agent is deployed to Mosaic AI Model Serving, you can use inference tables to monitor the agent. The inference tables contain detailed logs of requests, responses, agent traces, and agent feedback from the review app. This information lets you debug issues, monitor performance, and create a golden dataset for offline evaluation.
Traces are written asynchronously to minimize performance impact. However, tracing still adds latency to endpoint response speed, particularly when the trace size for each inference request is large. Databricks recommends testing your endpoint to understand tracing latency impacts before deploying to production.
The following table provides rough estimates for latency impact by trace size:
Manage data ingestion and preparation, model training and deployment, and machine learning solution monitoring with Python, Azure Machine Learning and MLflow.