Thank you for reaching out to Microsoft Q&A forum!
Automating Labeling: Instead of manually labeling resumes, I recommend using Azure Document Intelligence, which includes pre-trained models (e.g Read model for resumes) for extracting entities from documents like resumes. You can fine-tune the model for your specific NER tasks, allowing the model to predict labels, which can then be reviewed and corrected by humans. This streamlines the labeling process.
For your custom needs, you can try using the Custome extraction model.
Model Updating: Azure Document Intelligence integrates with Azure Machine Learning Pipelines, enabling you to set up automated workflows for retraining your model as new data is added. This allows for continuous learning and updating of your model without manual intervention, ensuring it remains up-to-date with the latest resumes.
Model Performance: By periodically retraining the model with new labeled data, you can maintain or even improve its performance. Azure Document Intelligence supports this by allowing incremental updates to your custom models, keeping them accurate over time as the data evolves.
Duplicate Detection: While Azure Document Intelligence focuses on extracting structured data from resumes, duplicate detection can still be approached using the extracted entities. By fine-tuning your model to focus on key fields such as name, contact details, and work history, you can identify potential duplicates based on these extracted attributes. The updated model will ensure better entity extraction, which can then be used to flag similar resumes for manual review or further processing.
I hope this helps! Thank you.