This issue occurs because OpenAI has likely updated its data validation and content moderation policies, making the CAPTCHA detection stricter.
OpenAI may have recently changed its rules for detecting CAPTCHAs in training data. Check their official fine-tuning documentation.
In this document in Fine-Tuning - Content moderation policy you can see Images containing the following will be excluded from your dataset and not used for training:
CAPTCHAs:
Contains CAPTCHAs, contains people, contains faces, contains children
Remove the image. For now, we cannot fine-tune models with images containing these entities.
Why?
Allowing AI models to train on images containing CAPTCHAs poses serious security risks:
Bypassing Security Measures:
CAPTCHAs are specifically designed to block automated systems. If an AI model is trained to recognize and solve them, it could potentially be used to circumvent security systems, making websites and services vulnerable to bot attacks.
Facilitating Malicious Use Cases:
Cybercriminals could exploit AI trained on CAPTCHAs to automate attacks, such as:
Credential stuffing (brute-force login attempts using leaked passwords).
Spamming and phishing by automating bot-driven form submissions.
Scraping protected content from websites that use CAPTCHAs as a defense.
Legal and Ethical Concerns:
Many platforms (Google, reCAPTCHA, Cloudflare, etc.) have terms of service that prohibit AI models from being trained on CAPTCHA data.
Hope this helps. Do let us know if you any further queries.
-------------
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful.
Thank you.