MLflow 追蹤藉由擷取執行詳細數據,並將其傳送至 Databricks 工作區,讓您可在 MLflow UI 中檢視它們,為部署在 Databricks 外部的生產 GenAI 應用程式提供完整的可觀察性。
生產追蹤的運作方式:
- 您的應用程式會產生追蹤 - 每個 API 呼叫都會建立追蹤數據
- 追蹤紀錄會被保存到 Databricks MLflow 追蹤伺服器 - 使用工作區認證
- 在 MLflow UI 中檢視 - 在您的 Databricks 工作區中分析追蹤數據
此頁面涵蓋在 Databricks 外部部署的追蹤應用程式。 如果您的應用程式是使用 Databricks 模型服務來部署,請參閱 使用 Databricks 模型服務進行追蹤。
先決條件
備註
生產追蹤需要 MLflow 3。 生產追蹤不支援 MLflow 2.x。
安裝必要的套件。 下表描述您的選項:
主題 | mlflow-tracing |
mlflow[databricks] |
---|---|---|
建議的使用案例 | 生產部署 | 開發和實驗 |
福利 | 最小相依性以實現精簡、快速的部署 效能已針對高流量追蹤進行優化 著重於用於生產監控的用戶端追蹤 |
完整的 MLflow 實驗功能組合(包含用戶界面、LLM 擔任評審、開發工具等) 包含所有開發工具和公用程式 |
## Install mlflow-tracing for production deployment tracing
%pip install --upgrade mlflow-tracing
## Install mlflow for experimentation and development
%pip install --upgrade "mlflow[databricks]>=3.1"
基本追蹤設定
設定應用程式部署以連線到 Databricks 工作區,讓 Databricks 可以收集追蹤。
設定下列環境變數:
# Required: Set the Databricks workspace host and authentication token
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-databricks-token"
# Required: Set MLflow Tracking URI to "databricks" to log to Databricks
export MLFLOW_TRACKING_URI=databricks
# Required: Configure the experiment name for organizing traces (must be a workspace path)
export MLFLOW_EXPERIMENT_NAME="/Shared/production-genai-app"
部署範例
設定環境變數之後,請將它們傳遞至您的應用程式。 按兩下索引標籤,瞭解如何將連線詳細數據傳遞至不同的架構。
Docker(用於開發、傳遞和運行應用程式的平台)
針對 Docker 部署,透過容器組態傳遞環境變數:
# Dockerfile
FROM python:3.9-slim
# Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy application code
COPY . /app
WORKDIR /app
# Set default environment variables (can be overridden at runtime)
ENV DATABRICKS_HOST=""
ENV DATABRICKS_TOKEN=""
ENV MLFLOW_TRACKING_URI=databricks
ENV MLFLOW_EXPERIMENT_NAME="/Shared/production-genai-app"
CMD ["python", "app.py"]
使用環境變數執行容器:
docker run -d \
-e DATABRICKS_HOST="https://your-workspace.cloud.databricks.com" \
-e DATABRICKS_TOKEN="your-databricks-token" \
-e MLFLOW_TRACKING_URI=databricks \
-e MLFLOW_EXPERIMENT_NAME="/Shared/production-genai-app" \
-e APP_VERSION="1.0.0" \
your-app:latest
Kubernetes
針對 Kubernetes 部署,使用 ConfigMaps 和 Secrets 傳送環境變數。
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: databricks-config
data:
DATABRICKS_HOST: 'https://your-workspace.cloud.databricks.com'
MLFLOW_TRACKING_URI: databricks
MLFLOW_EXPERIMENT_NAME: '/Shared/production-genai-app'
---
# secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: databricks-secrets
type: Opaque
stringData:
DATABRICKS_TOKEN: 'your-databricks-token'
---
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: genai-app
spec:
template:
spec:
containers:
- name: app
image: your-app:latest
envFrom:
- configMapRef:
name: databricks-config
- secretRef:
name: databricks-secrets
env:
- name: APP_VERSION
value: '1.0.0'
確認追蹤資料收集
部署應用程式之後,請確認追蹤資料是否已正確收集。
import mlflow
from mlflow.client import MlflowClient
import os
# Ensure MLflow is configured for Databricks
mlflow.set_tracking_uri("databricks")
# Check connection to MLflow server
client = MlflowClient()
try:
# List recent experiments to verify connectivity
experiments = client.search_experiments()
print(f"Connected to MLflow. Found {len(experiments)} experiments.")
# Check if traces are being logged
traces = mlflow.search_traces(
experiment_names=[os.getenv("MLFLOW_EXPERIMENT_NAME", "/Shared/production-genai-app")],
max_results=5
)
print(f"Found {len(traces)} recent traces.")
except Exception as e:
print(f"Error connecting to MLflow: {e}")
print(f"Check your authentication and connectivity")
為追蹤新增情境
完成基本追蹤工作後,請新增內容以提供更佳的偵錯和深入見解。 MLflow 具有下列標準化標記和屬性,可擷取重要的內容資訊:
- 請求追蹤 - 將追蹤連結至特定 API 呼叫以進行端到端除錯
- 用戶會話 - 群組相關互動以了解使用者旅程圖
- 環境數據 - 追蹤每個追蹤生成的部署、版本或區域
- 用戶意見反應 - 收集品質評等,並將其連結至特定互動
追蹤要求、會話和使用者背景
生產應用程式必須同時追蹤多個內容片段:用於偵錯的用戶端要求標識碼、多回合交談的會話標識碼、個人化和分析的使用者識別碼,以及作深入解析的環境元數據。 以下是示範如何在 FastAPI 應用程式中追蹤所有這些專案的完整範例:
import mlflow
import os
from fastapi import FastAPI, Request, HTTPException
from pydantic import BaseModel
# Initialize FastAPI app
app = FastAPI()
class ChatRequest(BaseModel):
message: str
@mlflow.trace # Ensure @mlflow.trace is the outermost decorator
@app.post("/chat") # FastAPI decorator should be inner
def handle_chat(request: Request, chat_request: ChatRequest):
# Retrieve all context from request headers
client_request_id = request.headers.get("X-Request-ID")
session_id = request.headers.get("X-Session-ID")
user_id = request.headers.get("X-User-ID")
# Update the current trace with all context and environment metadata
# The @mlflow.trace decorator ensures an active trace is available
mlflow.update_current_trace(
client_request_id=client_request_id,
tags={
# Session context - groups traces from multi-turn conversations
"mlflow.trace.session": session_id,
# User context - associates traces with specific users
"mlflow.trace.user": user_id,
# Environment metadata - tracks deployment context
"environment": "production",
"app_version": os.getenv("APP_VERSION", "1.0.0"),
"deployment_id": os.getenv("DEPLOYMENT_ID", "unknown"),
"region": os.getenv("REGION", "us-east-1")
}
)
# --- Your application logic for processing the chat message ---
# For example, calling a language model with context
# response_text = my_llm_call(
# message=chat_request.message,
# session_id=session_id,
# user_id=user_id
# )
response_text = f"Processed message: '{chat_request.message}'"
# --- End of application logic ---
# Return response
return {
"response": response_text
}
# To run this example (requires uvicorn and fastapi):
# uvicorn your_file_name:app --reload
#
# Example curl request with all context headers:
# curl -X POST "http://127.0.0.1:8000/chat" \
# -H "Content-Type: application/json" \
# -H "X-Request-ID: req-abc-123-xyz-789" \
# -H "X-Session-ID: session-def-456-uvw-012" \
# -H "X-User-ID: user-jane-doe-12345" \
# -d '{"message": "What is my account balance?"}'
這個結合的方法提供數個優點:
- 用戶端要求標識碼:透過將追蹤與整個系統的特定用戶端要求相互關聯,來啟用端對端偵錯
-
會話標識碼(標籤:
mlflow.trace.session
)群組化多回合交談的追蹤,讓您能分析完整的交談流程 -
使用者識別碼 (標籤: ):
mlflow.trace.user
將追蹤與特定使用者產生關聯,以進行個人化、世代分析和使用者特定偵錯 - 環境元數據:追蹤部署環境(環境、版本、區域)以提供不同部署的營運見解和偵錯用途
欲了解有關為追蹤新增上下文的詳細資訊,請參閱 追蹤使用者和會話 和 追蹤環境與上下文的文件。
收集用戶意見反應
擷取特定互動的用戶意見反應對於了解品質及改善 GenAI 應用程式至關重要。 此範例會以 上一節所示的用戶端要求標識符追蹤為基礎,示範如何使用該標識符將意見反應連結至特定追蹤。
以下是在 FastAPI 中實作意見反應收集的範例:
import mlflow
from mlflow.client import MlflowClient
from fastapi import FastAPI, Query, Request
from pydantic import BaseModel
from typing import Optional
from mlflow.entities import AssessmentSource
# Initialize FastAPI app
app = FastAPI()
class FeedbackRequest(BaseModel):
is_correct: bool # True for correct, False for incorrect
comment: Optional[str] = None
@app.post("/chat_feedback")
def handle_chat_feedback(
request: Request,
client_request_id: str = Query(..., description="The client request ID from the original chat request"),
feedback: FeedbackRequest = ...
):
"""
Collect user feedback for a specific chat interaction identified by client_request_id.
"""
# Search for the trace with the matching client_request_id
client = MlflowClient()
# Get the experiment by name (using Databricks workspace path)
experiment = client.get_experiment_by_name("/Shared/production-app")
traces = client.search_traces(
experiment_ids=[experiment.experiment_id],
filter_string=f"attributes.client_request_id = '{client_request_id}'",
max_results=1
)
if not traces:
return {
"status": "error",
"message": f"Unable to find data for client request ID: {client_request_id}"
}, 500
# Log feedback using MLflow's log_feedback API
mlflow.log_feedback(
trace_id=traces[0].info.trace_id,
name="response_is_correct",
value=feedback.is_correct,
source=AssessmentSource(
source_type="HUMAN",
source_id=request.headers.get("X-User-ID")
),
rationale=feedback.comment
)
return {
"status": "success",
"message": "Feedback recorded successfully",
"trace_id": traces[0].info.trace_id,
"client_request_id": client_request_id,
"feedback_by": request.headers.get("X-User-ID")
}
# Example usage:
# After a chat interaction returns a response, the client can submit feedback:
#
# curl -X POST "http://127.0.0.1:8000/chat_feedback?client_request_id=req-abc-123-xyz-789" \
# -H "Content-Type: application/json" \
# -H "X-User-ID: user-jane-doe-12345" \
# -d '{
# "is_correct": true,
# "comment": "The response was accurate and helpful"
# }'
此意見反應收集方法可讓您:
- 將意見反應連結至特定互動:使用用戶端要求標識碼來尋找確切的追蹤並附加意見反應
-
儲存結構化意見反應:
log_feedback
API 會建立適當的評定對象,這些物件會顯示在 MLflow UI 中 - 分析品質模式:使用其相關聯的意見反應來查詢追蹤,以識別哪些互動類型會收到正面或負面評等
您稍後可以使用 MLflow UI 查詢追蹤,或以程式設計方式分析模式並改善您的應用程式。
通過上下文查詢追蹤
使用內容資訊來分析生產行為:
import mlflow
from mlflow.client import MlflowClient
import pandas as pd
client = MlflowClient()
experiment = client.get_experiment_by_name("/Shared/production-app")
# Query traces by user
user_traces = client.search_traces(
experiment_ids=[experiment.experiment_id],
filter_string="tags.`mlflow.trace.user` = 'user-jane-doe-12345'",
max_results=100
)
# Query traces by session
session_traces = client.search_traces(
experiment_ids=[experiment.experiment_id],
filter_string="tags.`mlflow.trace.session` = 'session-123'",
max_results=100
)
後續步驟
使用這些建議的動作和教學課程繼續您的旅程。
- 在生產環境中執行計分器 - 設定生產流量的自動化質量評估
參考指南
查看本指南中提到的概念和功能的詳細文件。