Document Intelligence Batch API format of jsonl file for azureBlobFileListSource file list

Erik Meijer 0 Reputation points
2024-10-12T21:45:05.77+00:00

The docs for the batch api preview https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept-batch-analysis?view=doc-intel-4.0.0 say that when using azureBlobFileListSource the fileList field should point to a jsonl file with the list of documents to process.

Unofrtunately, I cannot find anywhere what the format is for the lines in that file. I tried to put just the filename + path as a quoted string, but then I get

{"code":"InvalidContentSourceFormat","message":"Invalid content source: Could not parse JSON on line 1 in file list 'XXXX.jsonl'."}}}

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,111 questions
{count} votes

1 answer

Sort by: Most helpful
  1. navba-MSFT 27,545 Reputation points Microsoft Employee Moderator
    2024-10-14T09:33:28.9266667+00:00

    @Erik Meijer Welcome to Microsoft Q&A! Thanks for posting the question.

    .

    Below is the syntax for azureBlobFileListSource attribute:

    POST /documentModels/{modelId}:analyzeBatch
    [
      {
        "azureBlobSource": {
          "containerUrl": "{your-source-container-SAS-URL}",
          "prefix": "trainingDocs/"
        },
        "azureBlobFileListSource": {
          "containerUrl": "{your-source-container-SAS-URL}",
          "fileList": "myFileList.jsonl"
        },
        "resultContainerUrl": "{your-result-container-SAS-URL}",
        "resultPrefix": "trainingDocsResult/",
        "overwriteExisting": false
      }
    ]
    

    .

    More info here:

    User's image

    . Here is the content of the jsonl file for your reference:

    {"file":"IRS-1040-A/train/IRS_1040_1_01.pdf"}
    {"file":"IRS-1040-A/train/IRS_1040_1_02.pdf"}
    {"file":"IRS-1040-A/train/IRS_1040_1_03.pdf"}
    {"file":"IRS-1040-A/train/IRS_1040_1_04.pdf"}
    {"file":"IRS-1040-A/train/IRS_1040_1_05.pdf"}
    

    .

    More info here.

    .

    • Are you invoking the API from SDK or REST API ?
    • Are you using Postman to invoke this API ?
    • Please share your complete request header and request body so that I can validate the format at my end. (Share these details over Private message.)

    Awaiting your reply.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.