Request example for multigrader with custom grader in Reinforcement fine tuning in AI foundry.

Question

Request example for multigrader with custom grader in Reinforcement fine tuning in AI foundry.

Oliver Su (Artech Consulting LLC) 60 Microsoft Employee

Hi there, my custom grader is working fine when using it separately but when i combine with it with other in-built grader, it always failed. In the website tutorial, there is no template for multiple grader which includes custom grader, could u give an example please?

This is what i have.

{
"name":"sample_multi_grader",
"type":"multi",
"graders":{"ext_text_similarity":{"name":"ext_text_similarity",
"type":"text_similarity",
"input":"{{sample.output_json.ext_text}}",
"reference":"{{item.ext_text}}",
"evaluation_metric":"fuzzy_match"},

"custom_check":{
"type":"python",
"source":"{import re ....}",
}
},
"calculate_output":"0.5 * ext_text_similarity + 0.5 * custom_check"
}

Anonymous

2025-10-29T11:00:36.1166667+00:00

Hi Oliver Su (Artech Consulting LLC)

Did you get any chance to review the above response.

Thank you!
Anonymous

2025-10-30T15:00:34.95+00:00

Hi Oliver Su (Artech Consulting LLC)

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

2 answers

Your answer

Anonymous

2025-10-29T11:00:36.1166667+00:00

Hi Oliver Su (Artech Consulting LLC)

Did you get any chance to review the above response.

Thank you!
Anonymous

2025-10-30T15:00:34.95+00:00

Hi Oliver Su (Artech Consulting LLC)

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

Hi Oliver Su (Artech Consulting LLC)

Example of how to define the input and reference for custom grader in the multi grader scenario please

{
  "model": "gpt-5-mini-2025-08-07",
  "method": {
    "type": "reinforcement",
    "reinforcement": {
      "hyperparameters": {
        "n_epochs": 3,
        "batch_size": 8,
        "eval_interval": 1,
        "eval_samples": 5
      },
      "grader": {
        "name": "summary_quality_multigrader",
        "type": "multi",
        "graders": {
          "text_sim": {
            "name": "text_similarity_grader",
            "type": "text_similarity",
            // Model Output → Reference mapping
            "input": "{{sample.output_json.response}}",
            "reference": "{{item.reference.answer}}",
            "evaluation_metric": "fuzzy_match"
          },
          "custom_quality": {
            "name": "custom_summary_quality",
            "type": "python",
            // Model Output → Reference mapping
            "input": "{{sample.output_json.response}}",
            "reference": "{{item.reference.answer}}",
            "source": "def grade(sample_text: str, reference_text: str) -> float:\n"
                      "    # Reward mention of 'AI Foundry' and brevity (< 20 words)\n"
                      "    if not sample_text:\n"
                      "        return 0.0\n"
                      "    score = 0.0\n"
                      "    if 'AI Foundry' in sample_text:\n"
                      "        score += 0.5\n"
                      "    if len(sample_text.split()) < 20:\n"
                      "        score += 0.5\n"
                      "    return min(score, 1.0)"
          }
        },
        // Weighted aggregation across graders
        "calculate_output": "0.6 * text_sim + 0.4 * custom_quality",
        "invalid_grade": 0.0
      }
    }
  }
}

Dataset: JSONL with clear split of input (what the model sees) and reference (what graders use).

Bindings: In each grader, set "input" to model output path and "reference" to ground truth path.

Custom grader: Python function returning a score in [0,1] (optionally a dict with score/reason if supported).

Aggregation: Use a weighted expression like "0.6 * text_sim + 0.4 * custom_quality".

Validation: Provide an invalid_grade fallback for edge cases.

I Hope this helps.

Thank you!

Answer 2

Azar 31,720 MVP Volunteer Moderator

Hi there Oliver Su (Artech Consulting LLC)

Thanks for using QandA platform

the multi-grader setup in Azure AI Foundry is a bit picky when combining built-in and custom graders. The main thing to check is that each grader inside your graders block explicitly defines both input and reference, even for the custom Python grader. Also, make sure the names you use in calculate_output exactly match the grader keys. For example, you can structure it like this: one grader for text similarity and another for your custom check, then combine them with something like "calculate_output": "0.5 * ext_text_similarity + 0.5 * custom_check". The custom grader’s source should return a numeric value (like 0 or 1). Once you align those details, it should work fin

If this helps kindly accept the answer

Oliver Su (Artech Consulting LLC) 60 Reputation points Microsoft Employee

2025-10-24T19:26:33.15+00:00

Hi there, could you give me an example of how to define the input and reference for cusotm grader in the multi grader scenario please?

Share via

Request example for multigrader with custom grader in Reinforcement fine tuning in AI foundry.

2 answers

Your answer