@ThierryL-3166 I think the recommendation to use separate OutputFileDatasetConfig objects for different steps is to avoid concurrent writes to a single object. As stated in a note in documentation:
Concurrent writes to a OutputFileDatasetConfig will fail. Do not attempt to use a single OutputFileDatasetConfig concurrently. Do not share a single OutputFileDatasetConfig in a multiprocessing situation, such as when using distributed training.
If your steps do not run in parallel then you can try to use a single object and check though.
With respect to using inputs or arguments, If you are using the same for the same operation then arguments would pass the same as input to the script used in the same step and you would need to use an argparser to retrieve the value in the script. Whereas, inputs would provide the same value as the run objects context in the same script. The section access datasets within script provides an example here for a train and test dataset where train is passed with arguments and test with inputs.
smaller_dataset = iris_dataset.take_sample(0.1, seed=seed) # 10%
train, test = smaller_dataset.random_split(percentage=0.8, seed=seed)
# In pipeline definition script:
# Code for demonstration only: It would be very confusing to split datasets between `arguments` and `inputs`
train_step = PythonScriptStep(
name="train_data",
script_name="train.py",
compute_target=cluster,
arguments=['--training-folder', train.as_named_input('train').as_download()],
inputs=[test.as_named_input('test').as_download()]
)
# In pipeline script
parser = argparse.ArgumentParser()
parser.add_argument('--training-folder', type=str, dest='train_folder', help='training data folder mounting point')
args = parser.parse_args()
training_data_folder = args.train_folder
testing_data_folder = Run.get_context().input_datasets['test']