The provided data failed validation. contains CAPTCHA

Question

The provided data failed validation. contains CAPTCHA

Lin, Fanghe (Emily) 20

Hello, I'm trying to fine-tune GPT4o with image data, but I'm encountering an issue: most of images in my training file are being flagged as "containing CAPTCHAs". Images looks fine. This is the first time I've seen this problem in training data validation.

I retested the old training files that I used to fine-tune GPT4o a month ago without any issues, and now they also flag most images as CAPTCHAs. Have there been any changes to the CAPTCHA threshold this month?

Prashanth Veeragoni 4,930 Reputation points Microsoft External Staff Moderator

2025-03-17T00:47:57.2333333+00:00

Hi Lin, Fanghe (Emily),

Following up to see if the below answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thank You.
Scott Genzer 0 Reputation points

2025-06-27T14:38:50.3766667+00:00

@Prashanth Veeragoni I have started testing fine-tuning functionality in AI Foundry and have encountered the exact same issue as described above. Many of my images were flagged as 'captcha' and hence excluded from fine-tuning. All the images look similar to the one below and are clearly not captchas. What is the feedback loop back to Azure AI Foundry to overcome these challenges?

Accepted answer

0 additional answers

Your answer

Prashanth Veeragoni 4,930 Reputation points Microsoft External Staff Moderator

2025-03-17T00:47:57.2333333+00:00

Hi Lin, Fanghe (Emily),

Following up to see if the below answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thank You.
Scott Genzer 0 Reputation points

2025-06-27T14:38:50.3766667+00:00

@Prashanth Veeragoni I have started testing fine-tuning functionality in AI Foundry and have encountered the exact same issue as described above. Many of my images were flagged as 'captcha' and hence excluded from fine-tuning. All the images look similar to the one below and are clearly not captchas. What is the feedback loop back to Azure AI Foundry to overcome these challenges?

Answer 1

Hi Lin, Fanghe (Emily),

This issue occurs because OpenAI has likely updated its data validation and content moderation policies, making the CAPTCHA detection stricter.

OpenAI may have recently changed its rules for detecting CAPTCHAs in training data. Check their official fine-tuning documentation.

In this document in Fine-Tuning - Content moderation policy you can see Images containing the following will be excluded from your dataset and not used for training:

CAPTCHAs:

Contains CAPTCHAs, contains people, contains faces, contains children

Remove the image. For now, we cannot fine-tune models with images containing these entities.

Why?

Allowing AI models to train on images containing CAPTCHAs poses serious security risks:

Bypassing Security Measures:

CAPTCHAs are specifically designed to block automated systems. If an AI model is trained to recognize and solve them, it could potentially be used to circumvent security systems, making websites and services vulnerable to bot attacks.

Facilitating Malicious Use Cases:

Cybercriminals could exploit AI trained on CAPTCHAs to automate attacks, such as:

Credential stuffing (brute-force login attempts using leaked passwords).

Spamming and phishing by automating bot-driven form submissions.

Scraping protected content from websites that use CAPTCHAs as a defense.

Legal and Ethical Concerns:

Many platforms (Google, reCAPTCHA, Cloudflare, etc.) have terms of service that prohibit AI models from being trained on CAPTCHA data.

Hope this helps. Do let us know if you any further queries.

-------------

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Thank you.

Share via

The provided data failed validation. contains CAPTCHA

0 additional answers

Your answer