Module assessment

3 minutes

An AI agent generates trip plans that fail to consider how customer constraints like fitness level, experience, budget, and weather conditions interact with each other. Which fine-tuning method best addresses this problem?

Supervised Fine-Tuning (SFT)

Reinforcement Fine-Tuning (RFT)

Direct Preference Optimization (DPO)

Which fine-tuning method requires training data structured as preference pairs, each containing a prompt alongside both a preferred and a non-preferred response?

Supervised Fine-Tuning (SFT)

Reinforcement Fine-Tuning (RFT)

Direct Preference Optimization (DPO)

What is the purpose of evaluating the base model before submitting a fine-tuning job?

To establish a baseline so you can measure whether fine-tuning improved performance.

To automatically generate labeled training examples from the base model's outputs.

To determine the correct number of epochs to use during training.

You must answer all questions before checking your work.

Feedback

Was this page helpful?