OpenAI Finetuning failed RAI checks, but moderation API didn't

Yovel Cohen 70 Reputation points
2024-10-14T05:14:14.33+00:00

Hi, Im trying to fine tune a GPT 4o model.
The training fails with the following error msg:

UserErrorException: Message: The provided training data failed RAI checks for harm types: ['violence']. Please fix the data and try again.  Details - Harmful lines per harm type: violence:

So I tried to clean the dataset with the following script:

import json
from openai import OpenAI

client = OpenAI(api_key="API-KEY")

threshold = 0.001


def main(input_file, output_file):
    # Read all lines from the input file and remove duplicates
    with open(input_file, 'r', encoding='utf-8') as infile:
        lines = infile.readlines()
        unique_lines = list(set(lines))  # Remove duplicate lines

    total_rows, cleared = len(unique_lines), 0
    with open(output_file, 'w', encoding='utf-8') as outfile:
        for line in unique_lines:
            try:
                data = json.loads(line)
                messages = data.get('messages', [])

                all_messages_approved = True  # Flag to track if all messages are approved

                # Submit each individual message content to the Moderation API
                for message in messages:
                    content = message.get('content', '')

                    if content:  # Ensure there's content to submit
                        response = client.moderations.create(input=content)
                        results = response.results[0]

                        # Check if any category has a score higher than the threshold
                        for category, has_problems in results.categories.model_dump().items():
                            if has_problems is True:
                                all_messages_approved = False
                                break

                    if not all_messages_approved:
                        break

                if all_messages_approved:
                    # Only write the original line if all messages are approved
                    cleared += 1
                    outfile.write(json.dumps(data, indent=None, ensure_ascii=False) + '\n')

            except Exception as e:
                print(f"Error processing line: {e}")

    print(f"Cleared {cleared} out of {total_rows} rows")


if __name__ == '__main__':
    input_ = "fine_tune/views/convertsations/v3.jsonl"
    output = "fine_tune/views/convertsations/approved_20241012_1844.jsonl"
    main(input_, output)

The script filtered out some training conversations, but even after that the training failed with the same error.

This raises two questions, one, Is Microsoft aware of this misalignment between it and OpenAI?
second, is there a way to get around the moderation? for example, parts of my training data include segments from movies and TV shows, which could consist of a lot of moderated language...

IMO, This is a huge bug, the amount of time and resources wasted on cleaning, validating, and running fine-tuning only for it to fail midway so you can find out you should have never uploaded your data in the first place...
Also, the studio doesn't say which rows are problematic, so how can we know how much of the dataset is invalid and fix it??

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
{count} votes

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.