question

BrianBarbieri-1018 avatar image
0 Votes"
BrianBarbieri-1018 asked ramr-msft answered

Using "cv_splits_indices" in AutoMLConfig

When training an regression model with AutoMLConfig with n_cross_validations being a normal int, I'm facing no problems.

Now I want to use TimeSeriesSplit as the cross validation method for training a model with AutoMLConfig. For this there is a "cv_splits_indices" argument where I put in a list of lists of indicis like the following when n_splits=5 in TimeSeriesSplit :


 array([[array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10]),
         array([11, 12, 13, 14])],
        [array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14]),
         array([15, 16, 17, 18])],
        [array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18]),
         array([19, 20, 21, 22])],
        [array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22]),
         array([23, 24, 25, 26])],
        [array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26]),
         array([27, 28, 29, 30])]], dtype=object)

Unfortunately when running the following cell:

 automl_settings = {
     "iteration_timeout_minutes": 15,
     "experiment_timeout_hours": 0.3,
     "max_cores_per_iteration" : -1,
     "enable_early_stopping": True,
     "primary_metric": 'normalized_root_mean_squared_error',
     "featurization": 'auto',
     "verbosity": logging.INFO,
     "cv_splits_indices": idxs
 }
    
 automl_config = AutoMLConfig(task='regression',
                              debug_log=f'automated_ml_errors_.log',
                              training_data=train,
                              validation_data=train,
                              label_column_name=y_var,
                              **automl_settings)

I receive the following error:

 ConfigException: ConfigException:
  Message: cv_splits_indices should be a List of List[numpy.ndarray]. Each List[numpy.ndarray] corresponds to a CV fold and should have just 2 elements: The indices for training set and for the validation set.
  InnerException: None
  ErrorResponse 
 {
     "error": {
         "code": "UserError",
         "message": "cv_splits_indices should be a List of List[numpy.ndarray]. Each List[numpy.ndarray] corresponds to a CV fold and should have just 2 elements: The indices for training set and for the validation set.",
         "details_uri": "https://aka.ms/AutoMLConfig",
         "target": "cv_splits_indices",
         "inner_error": {
             "code": "BadArgument",
             "inner_error": {
                 "code": "ArgumentInvalid"
             }
         },
         "reference_code": "XXXXXXREDACTEDXXXX"
     }
 }

What is going wrong here? My input looks correct?

Thank you


azure-machine-learning
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

ramr-msft avatar image
0 Votes"
ramr-msft answered

@BrianBarbieri-1018 Thanks for the question. Can you please add more details about the azure ML SDK version.
Here is the doc for cross validation data folds.


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@BrianBarbieri-1018 Thanks, Just checking in to see for update on the above details.

0 Votes 0 ·