An Azure service for ingesting, preparing, and transforming data at scale.
Welcome to Microsoft Q&A platform and thanks for posting your question.
You can use pathos, a Python package that can do multiprocessing not just across different cores within a single computer, but also with a cluster distributed across multiple machines . Pathos has the ability to establish connections to remote servers through a parallel map, and to do multiprocessing. Here’s an example of how you can use pathos to run a function in parallel across multiple machines:
https://stackoverflow.com/questions/26876898/python-multiprocessing-with-distributed-cluster
from pathos.core import connect
from pathos.pp import ParallelPythonPool as Pool
# Establish a ssh tunnel
tunnel = connect('remote.computer.com', port=1234)
# Define some function to run in parallel
def sleepy_squared(x):
from time import sleep
sleep(1.0)
return x**2
# Build a pool of servers and execute the parallel map
p = Pool(8, servers=('localhost:55774',))
y = p.map(sleepy_squared, x)
Alternatively, you can use Ansible, a configuration management tool that allows you to start with a controller node, making an inventory (list of hosts/machines), and share the controller’s public key on all the nodes/machines . With Ansible, you can deploy your Python scripts to multiple machines and run them in parallel. https://stackoverflow.com/questions/25792357/how-to-distribute-a-python-script-across-multiple-nodes Hope this helps. Do let us know if you any further queries..