Share via

Unable to Use Synthetic Dataset for RAG Evaluation in Azure AI Foundry – How to Convert Messages Format to JSONL?

Parul Paul 20 Reputation points
2026-02-19T05:20:29.0333333+00:00

Question:

I created a synthetic dataset using the Data → Synthetic Data Generation feature in Azure AI Foundry.

The generated dataset is in messages format, like this:

{

"messages": [

{"role": "user", "content": "What is SLA for premium tier?"},

{"role": "assistant", "content": "The premium tier guarantees 99.9% uptime."}

]

}

However, when I try to use this dataset for evaluation (RAG evaluation), it does not work as expected. The evaluators require structured fields such as:

question

ground_truth

context

answer

Since the synthetic dataset does not contain ground_truth or context, I am unable to run proper RAG evaluation.

My questions are:

How can I convert this messages format into the required JSONL evaluation format?

What is the correct structure for a RAG test dataset in Azure AI Foundry?

Is there an official way to generate evaluation-ready datasets instead of conversation format?

If I only have the synthetic dataset, how can I create a proper test dataset for evaluation?

Any guidance on the correct workflow for creating an evaluation-ready dataset would be helpful.

Foundry Tools
Foundry Tools

Formerly known as Azure AI Services or Azure Cognitive Services is a unified collection of prebuilt AI capabilities within the Microsoft Foundry platform

0 comments No comments
{count} votes

Answer accepted by question author
  1. Alex Burlachenko 19,615 Reputation points Volunteer Moderator
    2026-02-19T12:39:27.65+00:00

    Paul hi,

    synthetic data in foundry is generated in chat messages format mainly for fine tuning style scenarios not for rag evaluation so it will not plug into evaluators directly because rag evaluation expects structured jsonl with one record per line like {"question":"...","ground_truth":"...","context":"...","answer":"..."} where question is the user query ground_truth is the correct expected answer context is the retrieved document chunk and answer is the model output being evaluated u can convert the synthetic dataset by mapping user content to question assistant content to ground_truth but u still must populate context from ur actual indexed knowledge base otherwise retrieval metrics such as groundedness and faithfulness will not work there is no automatic built in conversion to rag ready format in the ui so the proper workflow is generate or collect q a pairs retrieve the relevant chunks from ur search index build a structured jsonl file and then upload that as an evaluation dataset for rag testing.

    rgds,

    Alex

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.