Hello Josh Christensen,
Thank you for posting your question in the Microsoft Q&A forum.
Thank you for sharing your detailed findings regarding the Unified Tax Model. It appears that there are inconsistencies in how the model handles individual and merged tax documents, particularly with Schedule 1 and multi-page tax forms. These issues can significantly impact the usability of the service, especially when end users upload a single PDF containing all their tax documents.
Schedule 1 is incorrectly classified as a 1040 when processed individually using the Unified Tax Model. The Schedule 1 model works correctly but requires explicit usage. The Unified Tax Model fails to correctly identify, split, and parse merged tax documents (e.g., 1040, Schedule 1, Schedule A, Schedule D, Schedule SE). The 1040 model identifies the correct number of documents but misclassifies all of them as 1040.
You may take a structured approach as below:
- Ensure that the input PDFs are well-structured and meet the requirements for the Unified Tax Model. Verify that the documents are not scanned images or poorly OCR-processed files, as this can affect parsing accuracy. The link you may refer - https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/overview?view=doc-intel-4.0.0
- If the Unified Tax Model does not meet your needs, consider training a custom model using Azure Form Recognizer. Custom models allow you to define specific document types and improve accuracy for your use case. The link you may refer - https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/prebuilt/tax-document?view=doc-intel-4.0.0&source=recommendations
- As a temporary workaround, implement preprocessing logic to split merged tax documents into individual pages or sections before processing. Use the Schedule 1 model for Schedule 1 documents and the Unified Tax Model for other documents.
The issues you are encountering with the Unified Tax Model highlight the need for improvements in handling Schedule 1 and merged tax documents. While implementing preprocessing logic and using custom models can serve as temporary workarounds, escalating the issue to Microsoft Support and providing feedback are essential steps to ensure long-term resolution.
If the above answer helped, please do not forget to "Accept Answer" as this may help other community members to refer the info if facing a similar issue. Your contribution to the Microsoft Q&A community is highly appreciated.