Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
AI Functions use the built-in Fabric-hosted large language model (LLM) endpoint to transform and enrich your data without separate endpoint setup. This article explains the billing meter, consumption rates, and usage monitoring options for that built-in endpoint.
Important
This article applies to AI Functions that use the built-in Fabric LLM endpoint. You can configure a custom Azure OpenAI, Microsoft Foundry, or OpenAI-compatible endpoint for pandas and PySpark AI Functions. When you do, billing is governed by that endpoint and your configuration. For setup details, see Customize AI Functions with pandas and Customize AI Functions with PySpark.
Billing meter
AI function calls through the built-in Fabric LLM endpoint are billed to your Fabric capacity under the Copilot and AI meter. In the Microsoft Fabric Capacity Metrics app, usage appears as the AI Functions operation.
| Usage | Billing meter or operation |
|---|---|
| AI function model calls | Copilot and AI meter, reported as AI Functions. |
| Spark compute that runs a notebook or Spark job | Spark billing meter. |
| Dataflow Gen2 compute that runs transformations | Dataflow Gen2 usage. |
| Warehouse or SQL analytics endpoint query compute | Data Warehouse or SQL analytics endpoint usage. |
View costs and spending
Use the Capacity Metrics app to monitor AI Functions spending and capacity impact:
- Open the Microsoft Fabric Capacity Metrics app.
- Filter to the capacity, workspace, and time range that ran your AI Functions workload.
- In operation-level views, look for AI Functions under the Copilot and AI meter.
- Compare the AI Functions operation with Spark, Dataflow Gen2, or warehouse operations to separate model-call consumption from the compute that orchestrated the workload.
Monitor runtime usage
During development, use runtime usage statistics to estimate and validate consumption before you scale a pipeline.
In pandas and PySpark notebooks, access ai.stats on AI function results to view execution and token usage details, including:
num_successful,num_exceptions,num_unevaluated, andnum_harmful.cached_tokens,input_tokens,output_tokens, andreasoning_tokens.client_type,input_types, andmodel.
# This code uses AI. Always review output for mistakes.
df["summary"] = df["text"].ai.summarize()
display(df["summary"].ai.stats)
display(df.ai.stats)
In pandas notebooks, set progress_bar_mode="stats" to show real-time token and capacity unit estimates while the function runs:
import synapse.ml.aifunc as aifunc
aifunc.default_conf.progress_bar_mode = "stats"
The progress bar shows live and projected cached input, input, output, and capacity unit estimates, then shows final values when the operation completes. For more information, see Progress bar modes and Customize AI Functions with PySpark.
Consumption rates
Unless you configure a different model, Python AI Functions for pandas and PySpark default to gpt-5-mini with reasoning_effort set to low. Consumption is based on token usage. Input, cached input, and output tokens can have different rates.
Language models
| Model | Deployment Name | Context Window (Tokens) | Input (Per 1,000 Tokens) | Cached Input (Per 1,000 Tokens) | Output (Per 1,000 Tokens) | Retirement Date |
|---|---|---|---|---|---|---|
| gpt-5.1-2025-11-13 | gpt-5.1 |
400,000 Max output: 128,000 |
42.02 CU seconds | 4.20 CU seconds | 336.13 CU seconds | |
| gpt-5-mini-2025-08-07 | gpt-5-mini |
400,000 Max output: 128,000 |
8.40 CU seconds | 0.84 CU seconds | 67.23 CU seconds | |
| gpt-4.1-mini-2025-04-14 | gpt-4.1-mini |
128,000 Max output: 32,768 |
13.45 CU seconds | 3.36 CU seconds | 53.78 CU seconds | June 30, 2026 |
| gpt-5-2025-08-07 | gpt-5 |
400,000 Max output: 128,000 |
42.02 CU seconds | 4.20 CU seconds | 336.13 CU seconds | June 11, 2026 |
| gpt-4.1-2025-04-14 | gpt-4.1 |
128,000 Max output: 32,768 |
67.23 CU seconds | 16.81 CU seconds | 268.91 CU seconds | June 11, 2026 |
Embedding models
| Model | Deployment name | Context window (tokens) | Input (per 1,000 tokens) |
|---|---|---|---|
| Ada | text-embedding-ada-002 |
8,192 | 3.36 CU seconds |
Consumption rates are subject to change. For the full consumption rate list and rate-change policy, see Consumption rate in Foundry Tools in Fabric.
Model migration guidance
The older GPT-4.1 model series is being retired. If you have pinned Python AI Functions pipelines to gpt-4.1, migrate them to gpt-5.1. If you pinned pipelines to gpt-4.1-mini, migrate them to gpt-5-mini.
For more sophisticated transformations, you can configure gpt-5.1 or tune reasoning_effort to use more compute for higher-quality results. For setup details, see Customize AI Functions with pandas and Customize AI Functions with PySpark.