What could I be doing wrong to get this result from Azure AutoML timeseries forecasting?

Felix Collins 96 Reputation points
2020-11-26T03:00:22.23+00:00

I'm experimenting with Azure AutoML for timeseries forecasting. I have a simple two column training dataset with two years of data at hourly intervals. Column 1 is Date/Time Column 2 is the variable I want to predict. I've done several runs of Azure AutoML and it seems to complete successfully. However, when I do a forecast and graph it something is obviously wrong. It looks like the forecast is being quantised somehow. The graph below is for the 7 days after the training set. Blue is actual and red is the forecast. This is obviously not right.

42777-badautoml.png

Here is my configuration for the training (python):

lags = [1,24,168]  
forecast_horizon = 7 * 24 # 7 days of hourly data  
forecasting_parameters = ForecastingParameters(  
    time_column_name="DateTime",  
    forecast_horizon=forecast_horizon,  
    target_lags=lags,  
    country_or_region_for_holidays='NZ',  
    freq='H',  
    use_stl='season',  
    seasonality='auto'  
)  
automl_config = AutoMLConfig(task='forecasting',  
                             debug_log='automl_forecasting_function.log',  
                             primary_metric='normalized_root_mean_squared_error',  
                             experiment_timeout_hours=1,  
                             experiment_exit_score=0.05,   
                             enable_early_stopping=True,  
                             training_data=train_df,  
                             compute_target=compute,  
                             n_cross_validations=10,  
                             verbosity = logging.INFO,  
                             max_concurrent_iterations=19,  
                             max_cores_per_iteration=19,  
                             label_column_name="Output",  
                             forecasting_parameters=forecasting_parameters,  
                             featurization="auto",  
                             enable_dnn=False)  

The best model from the run is a VotingEnsemble:

ForecastingPipelineWrapper(pipeline=Pipeline(  
  memory=None,  
  steps=[('timeseriestransformer',  
  TimeSeriesTransformer(  
    featurization_config=None,  
    pipeline_type=<TimeSeriesPipelineType.FULL: 1>)),  
  ('prefittedsoftvotingregressor',  
  PreFittedSoftVotingRegressor(estimators=[('7',  
  Pipeline(memory=None,  
  steps=[('minmaxscaler',  
  MinMaxScaler(copy=True,  
  feature_range=(0,  
  1))...  
  DecisionTreeRegressor(ccp_alpha=0.0,  
  criterion='mse',  
  max_depth=None,  
  max_features=0.5,  
  max_leaf_nodes=None,  
  min_impurity_decrease=0.0,  
  min_impurity_split=None,  
  min_samples_leaf=0.00218714609400816,  
  min_samples_split=0.00630957344480193,  
  min_weight_fraction_leaf=0.0,  
  presort='deprecated',  
  random_state=None,  
  splitter='best'))],  
  verbose=False))],  
  weights=[0.5,  
  0.5]))],  
  verbose=False),  
  stddev=None)  
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,897 questions
0 comments No comments
{count} votes

Accepted answer
  1. Felix Collins 96 Reputation points
    2020-11-26T20:37:59.827+00:00

    I tried again after turning off early stopping and letting it run for the two hours.... and got this43171-goodautoml.png

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.