Share via

How do I evaluate reasoning models (o3-mini) using Azure AI Foundry evaluation or Prompt Flow?

Kavishka Gamage 0 Reputation points
2025-03-20T09:03:30.6633333+00:00

I have tried to evaluate the o3-mini model using an existing dataset, via Azure Machine Learning Workspace PromptFlow, as well as in Azure AI Foundry Evaluation, Azure OpenAI Evaluation, and PromptFlow options. However, it failed due to the parameter differences between GPT models and reasoning models.

What are alternative ways to evaluate the o3-mini model to benchmark it against an existing dataset?

 Error code: 400 - {'error': {'message': "Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.", 'type': 'invalid_request_error', 'param': 'max_tokens', 'code': 'unsupported_parameter'}}
Foundry Tools
Foundry Tools

Formerly known as Azure AI Services or Azure Cognitive Services is a unified collection of prebuilt AI capabilities within the Microsoft Foundry platform


1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 42,286 Reputation points MVP Volunteer Moderator
    2025-03-20T11:04:15.4166667+00:00

    Hello Kavishka !

    Thank you for posting on Microsoft Learn.

    Unlike standard GPT models, reasoning models like o3-mini use:

    • max_completion_tokens instead of max_tokens
    • temperature (for randomness control)
    • top_p (nucleus sampling)

    If you are using Prompt Flow within Azure AI Studio, your YAML or JSON payload should include the parameter:

    parameters:
      model: "o3-mini"
      max_completion_tokens: 512
      temperature: 0.7
      top_p: 0.95
      prompt: "Evaluate the following reasoning dataset..."
    

    For Azure AI Foundry Evaluation, if you are running evaluations via API or SDK, update your payload:

    {
      "model": "o3-mini",
      "max_completion_tokens": 512,
      "temperature": 0.7,
      "top_p": 0.95,
      "input": "Evaluate the reasoning ability of this dataset..."
    }
    
    

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.