Hi Andrei Nutas,
Absolutely, you’re working with Azure OpenAI and Azure AI Foundry's RFT (Reinforcement Fine-Tuning) capability. Based on your provided sample and questions, let me guide you clearly through each Ask:
1. Is the data structured correctly?
Yes, this is a valid and supported structure for messages used in chat completions-style fine-tuning, assuming:
· You saved it as JSONL format, i.e., one JSON object per.
· Each example has system, user, and assistant messages in correct order.
2. How should I design a Grader?
Design Goals for Grader
· Input Format:
Input to grader should be messages (same as above).
Optionally, you can include a "score" field in training examples (range: 0 to 1 or 0 to 10).
· How to Train a Grader:
Provide multiple samples of messages, each with varied-quality assistant responses, and label them with a score.
Example:
{
"messages": [
{"role": "system", "content": "You are Dr. Andrei Nutas..."},
{"role": "user", "content": "Can you explain..."},
{"role": "assistant", "content": "Transhumanism is..."}
],
"score": 0.9
}
Your dataset should cover good, average, and bad outputs, with scores accordingly (0.9, 0.6, 0.3 etc).
- Evaluation Criteria (custom logic):
- Persona Match: Does the response reflect Dr. Nutas' style?
- Depth and Detail: Is it essay-style and philosophical?
- Factuality: Does it hallucinate or is it grounded?
- Tooling Tip:
- You can manually label 100–500 examples for the grader using these rules.
- Optionally, use a rule-based script or GPT-4 to auto-score responses and then review them manually.
3. What is the correct response format?
There are 3 types of datasets in Azure AI Foundry RFT:
Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.
**
Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.
Thank you!