Thank you for the answer.
We are still not able to save a file into a Blob storage container.
It is important that we use the DatabricksStep, but irrelevant whether the Notebook (Python script basic_DatabricksStep_script.py) is in AMLS or the Databricks workspace.
How we try to save a file into the Blob storage container:
Here is the Python script, which should be executed in the DatabricksStep
%%writefile $source_directory/basic_DatabricksStep_script.py
dbutils.widgets.get("input")
i = getArgument("input")
print ("Param -\'input':")
print (i)
dbutils.widgets.get("output")
dbutils.widgets.get("output")
o = getArgument("output")
print ("Param -\'output':")
print (o)
data = [('value1', 'value2')]
df2 = spark.createDataFrame(data)
z = o + "/output.txt"
df2.write.csv(z)
This is how we define the DatabricksStep
def_blob_store = Datastore(ws, "input_datastore")
step_1_input = DataReference(datastore=def_blob_store, path_on_datastore="dbtest",
data_reference_name="input")
output_data_folder_name = "output"
output_data_folder = PipelineData(output_data_folder_name, Datastore.get(ws, "output_datastore"))
dbNbWithExistingClusterStep = DatabricksStep(
name="DBFSReferenceWithExisting",
run_name='DBFS_Reference_With_Existing',
source_directory = source_directory,
python_script_name = "basic_DatabricksStep_script.py",
inputs=[step_1_input],
outputs=[output_data_folder],
compute_target=databricks_compute,
existing_cluster_id="XXXXXX",
allow_reuse=True,
permit_cluster_restart=True
)
Here is a picture for making it clearer what we want to achieve:
Currently, our pipeline is not getting built by AMLS even though we followed the examples of the official GitHub notebook for learning about the DatabricksStep class.
Can you make our pipeline work, please?
Thank you in advance for your support!
With best regards,
Alex