Issues with Cross-Encoder MS-MARCO-MiniLM-L-12 Model Deployment in Azure ML

Avinash Sachdewani 0 Reputation points
2024-10-23T06:56:56.2166667+00:00

A problem has been identified with the deployed cross-encoder-ms-marco-miniLM-L-12 model in Azure Machine Learning Studio. The model is producing unexpected and incorrect similarity scores that don't align with the expected behavior of the original cross-encoder model.

Issue Details:

Test Environment:

Deployment Name: cross-encoder-ms-marco-miniLM-L-12

Implementation: REST API with proper authentication headers

Model: cross-encoder-ms-marco-MiniLM-L-12

Observed Behavior:

Test results from three distinct test cases show concerning patterns:

a) Identical Text Comparison:

Input 1: "This is a test sentence."
Input 2: "This is a test sentence."
Result: { label: 'LABEL_0', score: 0.0007295869872905314 }

b) Completely Different Text:

Input 1: "The sky is blue."
Input 2: "I like to eat pizza."
Result: { label: 'LABEL_0', score: 0.004066287539899349 }

c) Semantically Similar Text:

Input 1: "The weather is beautiful today."
Input 2: "It's a gorgeous day outside."
Result: { label: 'LABEL_0', score: 0.0010920792119577527 }

Issues Identified:

All scores are extremely low (< 0.005)

Completely different sentences receive higher scores than identical sentences

Consistent LABEL_0 output for all comparisons regardless of similarity

Score range is not normalized between 0 and 1 as expected

Semantic similarity is not being captured correctly

Expected Behavior:

Identical sentences should receive the highest similarity scores (close to 1.0)

Completely different sentences should receive low scores (close to 0.0)

Semantically similar sentences should receive relatively high scores

Scores should be properly normalized between 0 and 1

Labels should reflect the degree of similarity

Implementation Details:

const requestBody = JSON.stringify({
    "inputs": {
        "text": "input_text",
        "text_target": "comparison_text"
    }
});

Questions:

Is there a known issue with the model deployment process that could affect score normalization?

Are there specific configuration requirements for cross-encoder models in Azure ML that may have been overlooked?

Should the input format or preprocessing steps be modified?

Are there any known limitations or requirements for this specific model in Azure ML?

Attempted Troubleshooting:

Verified API authentication and headers

Confirmed input format matches documentation

Tested with various text pairs to validate behavior

Logged full request/response cycle

Additional Information:

The implementation follows the standard REST API pattern for Azure ML endpoints

All requests are successful (no HTTP errors)

The issue appears consistent across multiple test runs

Requested Assistance:

Guidance on correct model deployment configuration for cross-encoder models

Confirmation of expected input/output format

Any known fixes or workarounds for similar issues

Best practices for cross-encoder deployment in Azure ML

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,955 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,205 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,895 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 46,751 Reputation points Microsoft Employee
    2024-10-25T07:12:07.72+00:00

    @Avinash Sachdewani AFAIK this model from model catalog is available as is and uses the guidance from the hugging face model page. I also see there is a higher version of the model ms-marco-MiniLM-L-2-v2 which may perform better.

    As per the swagger document that is available after deployment I see the format for inputs as the following in the request body.

                "Request": {
                    "title": "Request",
                    "required": [
                        "inputs"
                    ],
                    "type": "object",
                    "properties": {
                        "inputs": {
                            "title": "Inputs",
                            "anyOf": [
                                {
                                    "type": "string"
                                },
                                {
                                    "type": "array",
                                    "items": {
                                        "type": "string"
                                    }
                                },
                                {
                                    "type": "object"
                                }
                            ]
                        },
    
    

    The sample example shows the format as below as per swagger.

    {
      "inputs": "I like you. I love you"
    }
    
    

    W.r.t Azure ML while deploying the model you can choose only the type of compute and endpoint names, this should not affect the model's response w.r.t scores. You can raise a case on the source repo MSMARCO-Passage-Ranking on expected formats, scores but I do not see any active updates on the repo for a while.

    I hope this helps!!

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.