Hi everybody,
I just started learning how to use MS Azure and I got stuck with an apparently trivial issue.
I have my own pet ML project, a python script that runs a classification analysis with Tensorflow and Keras.
It runs smoothly locally and I am happy with it.
Now I am trying to run this script on Azure ML, hoping to take advantage from the available computing power and in general gaining some experience with the Azure services. I am a bit old style and I like the idea of running my code on my local IDE, rahter than running it in a notebook. Because of this, I focused on the python SDK libraries.
I created a free trial account on Azure and create a workspace. In order to adapt my original code to the
new task, I followed the example in https://learn.microsoft.com/en-us/azure/machine-learning/service/tutorial-train-models-with-aml?WT.mc_id=aisummit-github-amynic
The problem arises when I try to upload my locally-stored training data to the datastore of the workspace. The data is savedlocally in a parquet file, about 70Mb in size. The transfer fails after some time with a ProtocolError. After that it keeps retrying and failing with a NewConnectionError.
The snippet that reproduces the error is:
import numpy as np
import pandas as pd
from os.path import join as osjoin
import azureml.core
from azureml.core import Workspace,Experiment,Dataset,Datastore
from azureml.core.compute import AmlCompute,ComputeTarget
workdir = "."
# Set up Azure Workspace
# load workspace configuration from the config.json file in the current folder.
try:
ws = Workspace.from_config()
except:
print("Could not load AML workspace")
datadir= osjoin(workdir,"data")
local_files = [ osjoin(datadir,f) for f in listdir(datadir) if ".parquet" in f ]
# get the datastore to upload prepared data
datastore = ws.get_default_datastore()
datastore.upload_files(files=local_files, target_path=None, show_progress=True)
Everything runs smoothly until the last line. What happens is that the program starts to upload the file,
I can see that there is outbound traffic from my VPN monitor. From the upload speed and the size of the file, I would say that it uploads it completely or close to that, then I get this message * :
WARNING - Retrying (Retry(total=2, connect=3, read=2, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', OSError("(10054, 'WSAECONNRESET')"))': /azureml-blobstore-xxx/creditcard.parquet?comp=block&blockid=TURBd01...TURB...RA%3D%3D
WARNING - Retrying (Retry(total=1, connect=2, read=2, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002210A8BAF48>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /azureml-blobstore-xxx/creditcard.parquet?comp=block&blockid=TURBd01...TURB...RA%3D%3D
WARNING - Retrying (Retry(total=0, connect=1, read=2, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002210B446748>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /azureml-blobstore-xxx/creditcard.parquet?comp=block&blockid=TURBd01...TURB...RA%3D%3D
WARNING - Retrying (Retry(total=2, connect=2, read=3, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002210A8B5148>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /azureml-blobstore-xxx/creditcard.parquet?comp=block&blockid=TURBd01...TURB...RA%3D%3D
WARNING - Retrying (Retry(total=1, connect=1, read=3, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002210A891288>, 'Connection to creditfraudws2493375317.blob.core.windows.net timed out. (connect timeout=20)')': /azureml-blobstore-xxx/creditcard.parquet?comp=block&blockid=TURBd01...TURB...RA%3D%3D
WARNING - Retrying (Retry(total=0, connect=0, read=3, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002210A8BD3C8>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /azureml-blobstore-xxx/creditcard.parquet?comp=block&blockid=TURBd01...TURB...RA%3D%3D
From the initial ProtocolError, I understand that the Azure cloud server bounces me back, but it is
unclear to me why. Checking the workspace from the Azure portal, I would guess that the container of the workspace is still empty, but I am not 100% sure if I checked that correctly.
Maybe I misunderstood the different components of the storage services in AzureML and I not using
the API correctly. Am I doing something wrong? Is there a way for me to extract more information about
the reasons for this error?
Thanks a lot in advance for any help you can provide
[*] (I manually edited portions of the error message obfuscating the blobstore name)