An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
I understood that the Azure AI Document Intelligence model is struggling to accurately extract cancer treatment details, with low confidence in chemotherapy drugs, radiation data, and multiple surgeries. Even after training with 30-40 documents.
Enhance Model Training Techniques:
Use Prebuilt Models for Boosting Performance:
Instead of training from scratch, fine-tune Microsoft’s Prebuilt Healthcare model available in Azure AI Document Intelligence.
Leverage Azure’s Custom Classification to separate treatment types (Chemotherapy, Radiation, Surgery).
Define Custom Field Relationships:
Use custom fields to specify relationships between drugs, doses, and treatments.
Example: Define a "Chemotherapy Treatment" entity that links to specific drug names and dosages.
Segment Complex Fields:
Instead of extracting all surgeries into one field, use multi-instance fields where each surgery is extracted as a separate entity.
Example:
Surgery Type 1: Appendectomy
Surgery Type 2: Lumpectomy
Refine Labelling Strategy:
Ensure that labelled entities are consistent across all documents. Inconsistent annotations can confuse the model.
Use multiple labelers to cross-validate and remove errors.
Clearly differentiate between chemotherapy drugs vs. biologic agents and ensure they are labelled precisely in context.
Model Configuration & Retraining:
Increase Training Iterations:
If accuracy is low, retrain with different versions of the dataset by removing low-confidence entities and keeping only high-accuracy labels.
Try multiple training runs (3-5 iterations) while adjusting labelled examples.
Augment with Synonyms and Context Awareness:
Medical documents may contain synonyms (e.g., "Adriamycin" vs. "Doxorubicin" for chemotherapy).
Use custom dictionaries or Azure AI Knowledge Mining to handle terminology variations.
Optimize Confidence Thresholds:
If the model has low confidence but correct predictions, adjust post-processing rules to accept lower-confidence values and validate manually.
Post-Processing & Validation:
Apply Rule-Based Validation with Azure Logic Apps:
Use regular expressions (regex) and rule-based filters to validate extracted data, such as:
Chemotherapy drugs must be from a predefined list.
Radiation dose units (Gy, cGy) must be valid.
Use Human Review for Low-Confidence Cases:
Integrate Azure AI Human-in-the-Loop for manual review of low-confidence predictions to improve accuracy over time.
Alternate Approach:
Combine NLP-Based Models with Azure AI:
If Azure AI struggles, use Azure Machine Learning (AML) with NLP models like BERT or ClinicalBERT to extract medical entities with higher accuracy.
Integrate Azure Cognitive Search to index and retrieve structured treatment data.
Hope the above steps help to resolve your issue, if you have any further queries do let us know
Thank you!