Predicting a multivariate outcome with Automated ML in Azure Machine Learning

AeroG 6 Reputation points MVP
2023-05-31T07:32:21.36+00:00

Hi there,

I just wanted some confirmation if it is possible to predict a multivariate outcome using Automated ML in Azure Machine Learning. I am aware that when choosing regression, there isn't an option to predict or choose two target columns and therefore, this implies that multivariate regression is not an option within Automated ML. Is this correct?

Thank you 😊

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,561 questions
{count} vote

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,611 Reputation points
    2023-05-31T13:34:30.64+00:00

    @AeroG Thanks for the question. You are correct that when using Automated ML in Azure Machine Learning, there is no explicit option to perform multivariate regression. However, you can still use Automated ML to train a model that predicts multiple target variables simultaneously.

    One way to achieve this is to concatenate your target variables into a single column and use that as the target variable for regression. For example, if you have two target variables y1 and y2, you can concatenate them into a single column y with the following code:

    import pandas as pd
    
    # Load your data into a pandas DataFrame
    data = pd.read_csv('your_data.csv')
    
    # Concatenate the target variables into a single column
    data['y'] = data['y1'].astype(str) + ',' + data['y2'].astype(str)
    
    # Drop the original target variables
    data = data.drop(['y1', 'y2'], axis=1)
    
    # Save the modified data to a new CSV file
    data.to_csv('your_modified_data.csv', index=False)
    

    Then, when you run Automated ML, you can specify the y column as the target variable for regression. The resulting model will predict both y1 and y2 simultaneously.

    Keep in mind that this approach assumes that the target variables are correlated and can be predicted jointly. If the target variables are not correlated, it may be better to train separate models for each target variable.

    2 people found this answer helpful.