This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
An AI agent generates trip plans that fail to consider how customer constraints like fitness level, experience, budget, and weather conditions interact with each other. Which fine-tuning method best addresses this problem?
Supervised Fine-Tuning (SFT)
Reinforcement Fine-Tuning (RFT)
Direct Preference Optimization (DPO)
Which fine-tuning method requires training data structured as preference pairs, each containing a prompt alongside both a preferred and a non-preferred response?
What is the purpose of evaluating the base model before submitting a fine-tuning job?
To establish a baseline so you can measure whether fine-tuning improved performance.
To automatically generate labeled training examples from the base model's outputs.
To determine the correct number of epochs to use during training.
You must answer all questions before checking your work.
Was this page helpful?
Need help with this topic?
Want to try using Ask Learn to clarify or guide you through this topic?