Accepted data formats
If you're trying to import your data into custom text classification, it has to follow a specific format. If you don't have data to import you can create your project and use Language Studio to label your documents.
Labels file format
Your Labels file should be in the json
format below. This will enable you to import your labels into a project.
{
"projectFileVersion": "2022-05-01",
"stringIndexType": "Utf16CodeUnit",
"metadata": {
"projectKind": "CustomMultiLabelClassification",
"storageInputContainerName": "{CONTAINER-NAME}",
"projectName": "{PROJECT-NAME}",
"multilingual": false,
"description": "Project-description",
"language": "en-us"
},
"assets": {
"projectKind": "CustomMultiLabelClassification",
"classes": [
{
"category": "Class1"
},
{
"category": "Class2"
}
],
"documents": [
{
"location": "{DOCUMENT-NAME}",
"language": "{LANGUAGE-CODE}",
"dataset": "{DATASET}",
"classes": [
{
"category": "Class1"
},
{
"category": "Class2"
}
]
}
]
}
}
Key | Placeholder | Value | Example |
---|---|---|---|
multilingual | true |
A boolean value that enables you to have documents in multiple languages in your dataset and when your model is deployed you can query the model in any supported language (not necessarily included in your training documents). See language support to learn more about multilingual support. | true |
projectName | {PROJECT-NAME} |
Project name | myproject |
storageInputContainerName | {CONTAINER-NAME} |
Container name | mycontainer |
classes | [] | Array containing all the classes you have in the project. These are the classes you want to classify your documents into. | [] |
documents | [] | Array containing all the documents in your project and the classes labeled for this document. | [] |
location | {DOCUMENT-NAME} |
The location of the documents in the storage container. Since all the documents are in the root of the container, this value should be the document name. | doc1.txt |
dataset | {DATASET} |
The test set to which this file will go to when split before training. See How to train a model for more information. Possible values for this field are Train and Test . |
Train |
Next steps
- You can import your labeled data into your project directly. See How to create a project to learn more about importing projects.
- See the how-to article more information about labeling your data. When you're done labeling your data, you can train your model.