Hi, Imagine having a large Excel spreadsheet with 10,000 sentences already labeled. Azure, to train its model, wants each sentence to end up in a separate text file (like 00001.txt, 00002.txt and so on) and then in a single Labels.json file you tell it: «That file there is of category A, this other one of category B, use them for training or for testing». It seems boring, but all it takes is a small script (Python, R, whatever you prefer) that in a few seconds breaks your spreadsheet into many .txt files, generates the JSON and you upload the complete folder to Blob Storage with a single command. It's the standard flow: once automated, you don't think about it anymore.
Text classification when I have 10,000 labeled rows in a table
Girish Sharma
20
Reputation points
Hi, if I have many rows of labeled data (lets say 10000 for instance), then I need to make an individual .txt file for each row of text and a .json file for the labels? As that is what I understood from the text-classification documentation - https://learn.microsoft.com/en-us/azure/ai-services/language-service/custom-text-classification/concepts/data-formats?tabs=single-classification. Is there another way if the size of rows is large or is it normal to convert each text into an independent .txt file first, upload that in the container and then the .json file for lables?
Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
520 questions