Difference results generated by AutoML for same dataset

Yogeshkumar Shankariya 0 Reputation points
2023-06-05T08:01:45.9833333+00:00

Hi Team, I am currently working as a data scientist at a European bank. I have trained the model using AutoML.

When I passed the input to AutoML as below :

User's image

User's image

The result was not that good compared to when I passed the input as below :

User's image

User's image

There is a significant difference in the resulting output generated by both input methods.

The output of the second input method has really nice separation of positive and negative classes based on assigned probability while the output of the first input method is worse every time.

Why it is happening? If I apply the second method, then I need to break the pipeline into two, and not sure how the registered data update once I registered the second pipeline as an endpoint since the endpoint will take input the same as the time it was registered.

Please give solution for it as it is very critical to deliver the solution to client within deadline.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,710 questions
Azure Automation
Azure Automation
An Azure service that is used to automate, configure, and install updates across hybrid environments.
1,187 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Yogeshkumar Shankariya 0 Reputation points
    2023-06-07T14:36:42.6066667+00:00

    Hi @YutongTie-MSFT ,

    Thank you for the response.

    Actually, the issue gets resolved by writing the data to Parquet instead of CSV. CSV automatically converts some of the 'int' feature values to 'char' format, but Parquet keeps it consistent.

    I have one another question :

    I have used AutoML for model training in the pipeline. I want to get the best model name and feature importance into the next component of the AutoML component.

    I have created a pipeline like this :

    Data Extraction --> Data Cleaning --> Data Preparation --> AutoML model training --> Generating stats (Results and feature importance from AutoML)

    I am not able to find any solution to get the best model name and feature importance of the best model generated by AutoML into the same pipeline, here I want it into the "Generating stats" component next to AutoML. Can you please send me a solution link/script reference?