Text classification when I have 10,000 labeled rows in a table

Question

Text classification when I have 10,000 labeled rows in a table

Girish Sharma 20

Hi, if I have many rows of labeled data (lets say 10000 for instance), then I need to make an individual .txt file for each row of text and a .json file for the labels? As that is what I understood from the text-classification documentation - https://learn.microsoft.com/en-us/azure/ai-services/language-service/custom-text-classification/concepts/data-formats?tabs=single-classification. Is there another way if the size of rows is large or is it normal to convert each text into an independent .txt file first, upload that in the container and then the .json file for lables?

Accepted answer

0 additional answers

Your answer

Answer 1

Michele Ariis 1,960 MVP

Hi, Imagine having a large Excel spreadsheet with 10,000 sentences already labeled. Azure, to train its model, wants each sentence to end up in a separate text file (like 00001.txt, 00002.txt and so on) and then in a single Labels.json file you tell it: «That file there is of category A, this other one of category B, use them for training or for testing». It seems boring, but all it takes is a small script (Python, R, whatever you prefer) that in a few seconds breaks your spreadsheet into many .txt files, generates the JSON and you upload the complete folder to Blob Storage with a single command. It's the standard flow: once automated, you don't think about it anymore.

Girish Sharma 20 Reputation points

2025-06-25T13:04:41.77+00:00

Thanks a lot, this helps!

Share via

Text classification when I have 10,000 labeled rows in a table

0 additional answers

Your answer