OpenAI Finetuning failed RAI checks, but moderation API didn't

Question

OpenAI Finetuning failed RAI checks, but moderation API didn't

Yovel Cohen 70

Hi, Im trying to fine tune a GPT 4o model.
The training fails with the following error msg:

UserErrorException: Message: The provided training data failed RAI checks for harm types: ['violence']. Please fix the data and try again.  Details - Harmful lines per harm type: violence:

So I tried to clean the dataset with the following script:

import json
from openai import OpenAI

client = OpenAI(api_key="API-KEY")

threshold = 0.001


def main(input_file, output_file):
    # Read all lines from the input file and remove duplicates
    with open(input_file, 'r', encoding='utf-8') as infile:
        lines = infile.readlines()
        unique_lines = list(set(lines))  # Remove duplicate lines

    total_rows, cleared = len(unique_lines), 0
    with open(output_file, 'w', encoding='utf-8') as outfile:
        for line in unique_lines:
            try:
                data = json.loads(line)
                messages = data.get('messages', [])

                all_messages_approved = True  # Flag to track if all messages are approved

                # Submit each individual message content to the Moderation API
                for message in messages:
                    content = message.get('content', '')

                    if content:  # Ensure there's content to submit
                        response = client.moderations.create(input=content)
                        results = response.results[0]

                        # Check if any category has a score higher than the threshold
                        for category, has_problems in results.categories.model_dump().items():
                            if has_problems is True:
                                all_messages_approved = False
                                break

                    if not all_messages_approved:
                        break

                if all_messages_approved:
                    # Only write the original line if all messages are approved
                    cleared += 1
                    outfile.write(json.dumps(data, indent=None, ensure_ascii=False) + '\n')

            except Exception as e:
                print(f"Error processing line: {e}")

    print(f"Cleared {cleared} out of {total_rows} rows")


if __name__ == '__main__':
    input_ = "fine_tune/views/convertsations/v3.jsonl"
    output = "fine_tune/views/convertsations/approved_20241012_1844.jsonl"
    main(input_, output)

The script filtered out some training conversations, but even after that the training failed with the same error.

This raises two questions, one, Is Microsoft aware of this misalignment between it and OpenAI?
second, is there a way to get around the moderation? for example, parts of my training data include segments from movies and TV shows, which could consist of a lot of moderated language...

IMO, This is a huge bug, the amount of time and resources wasted on cleaning, validating, and running fine-tuning only for it to fail midway so you can find out you should have never uploaded your data in the first place...
Also, the studio doesn't say which rows are problematic, so how can we know how much of the dataset is invalid and fix it??

romungi-MSFT 49,096 Reputation points Microsoft Employee Moderator

2024-10-14T09:42:06.78+00:00

@Yovel Cohen I think this is a known limitation of the service where the lines which fail the job are not available to evaluate in the failed job logs. If you need specific lines that fail in a job, you will have to share the job details through Azure support case for the service team to check for the same. I understand that this could be an additional step for you to complete fine tuning but due to system limitations, support is the only option to get a complete list of lines.

If you would like to check upfront for any violations upfront you can run the evaluation SDK to check for problematic content. It can help to provide more details if the score is high for individual lines or rows. I hope this helps!!
Yovel Cohen 70 Reputation points

2024-10-14T12:27:15.61+00:00

@romungi-MSFT thanks for the info, what if I can't open a support ticket (I don't have the required support plan yet)? am I just stuck with it?
Also, the evaluation SDK is a nice idea, but to be able to use it. A baseline needs to be established, Either by providing the score thresholds the platform uses which are also not exposed, so it's a guessing game.

Your answer

romungi-MSFT 49,096 Reputation points Microsoft Employee Moderator

2024-10-14T09:42:06.78+00:00

@Yovel Cohen I think this is a known limitation of the service where the lines which fail the job are not available to evaluate in the failed job logs. If you need specific lines that fail in a job, you will have to share the job details through Azure support case for the service team to check for the same. I understand that this could be an additional step for you to complete fine tuning but due to system limitations, support is the only option to get a complete list of lines.

If you would like to check upfront for any violations upfront you can run the evaluation SDK to check for problematic content. It can help to provide more details if the score is high for individual lines or rows. I hope this helps!!
Yovel Cohen 70 Reputation points

2024-10-14T12:27:15.61+00:00

@romungi-MSFT thanks for the info, what if I can't open a support ticket (I don't have the required support plan yet)? am I just stuck with it?
Also, the evaluation SDK is a nice idea, but to be able to use it. A baseline needs to be established, Either by providing the score thresholds the platform uses which are also not exposed, so it's a guessing game.

Share via

OpenAI Finetuning failed RAI checks, but moderation API didn't

Your answer