Is it possible to do distributed training/fit of a model in ML.NET to multiple workers/servers?
hello, I would love for Microsoft/ML.net to allow us to train on multiple machines to go faster. Microsoft should do the following:
Version 1: Right now Auto ML starts many algo 1 by 1 on one machine, a simple version 1 would be to start those algos in parallels on multiple computer nodes (slaves) and aggregate the results on the main computer (master node) where AutoM orchestrate the runs and aggregate the results
Version 2: allow to add Azure as a slave node in the mesh in combination to local computers...so we can pay for extra boost when needed.
version 3: I would work on accelerating the most used algos by distributing the work across the mesh nodes (harder to do, would take more time).
I think version 1 and 2 should be easy and fast to implement. Those will increase ML.Net usage but also allow people to get a hardware boost when needed by adding azure as a participating computational node
Version 3 is more complex and will take more time.
4 people are following this question.