Azure - LLM Model

Srushti Prashant zope 0 Reputation points
2023-06-05T14:53:04.35+00:00

I attempted to train a language model using identical training and validation datasets. Surprisingly, while the training token level accuracy reached around 63%, the validation token level accuracy dropped to negative values. This raises the question of why this occurred despite the datasets being the same. Furthermore, what can be done to fine-tune the model and enhance its relevancy?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,024 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,400 questions
Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
432 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 53,341 Reputation points
    2023-06-06T02:51:26.71+00:00

    Hello @Srushti Prashant zope

    Thanks for reaching out to us, please let us know how and where you trained your model so that we can provide more information since you include three different products' tag.

    Assumed you are working on Azure Language Service to train your model, while training a language model using Azure Language Service, encountering negative validation token level accuracy is highly unusual based on my personal experience. Here are some potential causes you may consider (Some of them also works for other products):

    1. Dataset Mismatch: Double-check the data used for training and validation to ensure they are identical and have the same formatting and preprocessing.
    2. Data Leakage: Confirm that there is no data leakage between the training and validation sets. Ensure the training and validation datasets are completely independent.
    3. Model Architecture and Hyperparameters: Azure Language Service provides pre-trained models with default settings. However, you can experiment with different architectures or hyperparameter configurations to optimize performance.

    Training Process: Review the training process and parameters to ensure they are correctly set. Verify that the model is being trained for an adequate number of epochs without underfitting or overfitting. Check if the learning rate, batch size, and other training settings are appropriate for your specific task.

    1. Augment Training Data: If possible, increase the size of your training dataset. A larger and more diverse dataset can help the model generalize better and improve its performance on unseen data.
    2. Regularization Techniques: Apply regularization techniques to prevent overfitting and improve generalization.
    3. Error Analysis: Conduct a thorough analysis of the model's errors by examining the incorrectly predicted examples in the validation set.

    Fine-tuning with Transfer Learning: Instead of training the model from scratch, consider leveraging transfer learning. Start with a pre-trained model that is already proficient in a related task and fine-tune it on your specific dataset. This approach can save time and potentially yield better results.

    If you are working on other products, please let us know and we are happy to help further. Thanks.

    Regards,

    Yutong

    -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.