I'm trying to implement streaming with token usage tracking for Azure OpenAI "on your data" feature, but encountering a validation error when using the stream_options
parameter that works fine with regular chat completions.
Environment Details
- Service: Azure OpenAI Service
- API Version: 2024-06-01 (also tested with 2024-10-21)
- Model: GPT-4o (also tested with GPT-4.1)
- Feature: Azure OpenAI "on your data" with Azure AI Search
- SDK: openai-python library (latest version)
- Authentication: System-assigned managed identity
What I'm Trying to Achieve
Enable streaming responses with token usage tracking when using Azure OpenAI "on your data" feature, similar to how stream_options: {"include_usage": true}
works with regular chat completions.
Code That Causes the Error
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
# Client setup
token_provider = get_bearer_token_provider(
DefaultAzureCredential(),
"https://cognitiveservices.azure.com/.default"
)
client = AzureOpenAI(
azure_endpoint=endpoint,
azure_ad_token_provider=token_provider,
api_version="2024-06-01"
)
# This code fails with validation error
stream = client.chat.completions.create(
model=deployment_name,
messages=[{"role": "user", "content": "Test message"}],
stream=True,
stream_options={"include_usage": True}, # This line causes the error
data_sources=[
{
"type": "azure_search",
"parameters": {
"endpoint": search_endpoint,
"index_name": search_index,
"authentication": {"type": "system_assigned_managed_identity"}
}
}
]
)
Error Message
{
"error_message": "Validation error at #/stream_options: Extra inputs are not permitted",
"content_filter_result": {}
}
What Works
- Regular streaming without token usage (removing
stream_options
parameter):
# This works fine
stream = client.chat.completions.create(
model=deployment_name,
messages=[{"role": "user", "content": "Test message"}],
stream=True,
# stream_options removed
data_sources=[...]
)
- Token usage with regular chat completions (without
data_sources
):
# This also works fine
stream = client.chat.completions.create(
model=deployment_name,
messages=[{"role": "user", "content": "Test message"}],
stream=True,
stream_options={"include_usage": True}
# No data_sources parameter
)
Is this a known limitation? Does Azure OpenAI "on your data" officially not support stream_options
parameter?
Roadmap for support? Are there plans to add stream_options
support for the "on your data" feature?
Official workaround? What's the recommended approach for tracking token usage when using "on your data" with streaming?
API documentation clarity? Should this limitation be explicitly documented in the "on your data" API reference?
Current Workaround
I'm currently using tiktoken for token estimation as suggested in the documentation:
import tiktoken
tokenizer = tiktoken.get_encoding("gpt2")
def estimate_tokens(text: str) -> int:
return len(tokenizer.encode(text))
# Estimate input and output tokens manually
input_tokens = estimate_tokens(user_input)
output_tokens = estimate_tokens(response_text)
Impact
This limitation affects:
- Cost tracking for applications using "on your data"
- Usage monitoring and billing accuracy
- Rate limiting implementations
- Performance optimization based on token usage
Request
Could the Microsoft team please:
- Confirm whether this is intended behavior or a bug
- Provide timeline for potential
stream_options
support with "on your data"
- Update documentation to clearly state this limitation
- Suggest best practices for token tracking in this scenario
Thank you for any guidance or clarification the team can provide!