Issues with Automatic CSV Uploads to Databricks Volume Using dbxservice

Question

Issues with Automatic CSV Uploads to Databricks Volume Using dbxservice

It_trainer 20

I initially created an import folder manually, which successfully stored CSV files as volumes in Databricks. After deleting this manually created import folder, I expected the pipeline to automatically recognize the new import folder created during the pipeline execution and process the CSV files as artifacts. However, the pipeline is not uploading the CSV files to the Databricks volume as expected. Additionally, I encountered an error related to an undefined variable job_mi_id when trying to use the Databricks CLI command to copy files.

Venkat Reddy Navari 2,975 Reputation points Microsoft External Staff Moderator

2025-03-27T13:22:58.2833333+00:00
Hi @It_trainer

Pipeline not recognizing the new import folder. After deleting the manually created import folder, the pipeline isn’t detecting the new one. This could be due to:

The folder not being recreated properly during pipeline execution: Ensure the folder exists before using it by creating it in your Databricks script: dbutils.fs.mkdirs("/Volumes/<catalog>/<schema>/<volume>/import").Refer to Manage files in volumes for creating and managing folders in Databricks volumes.

Permissions issues preventing the pipeline from accessing or writing to the new folder: Verify that your service or user running the pipeline has read/write access to the volume. See Upload files to a Unity Catalog volume for details on volume permissions and file management.

A path mismatch where the pipeline is looking in the wrong place: Double-check that the file path used in the pipeline matches the actual directory structure in Databricks. The Databricks file system guide can help confirm the correct paths.

Pipeline is trying to use job_mi_id, but it’s not defined. This usually happens when:

The variable is not being passed to the pipeline correctly: Verify where job_mi_id is expected to originate—whether as an environment variable or a parameter—and ensure it is properly set before use.

The variable is missing from the pipeline configuration: Review your pipeline settings to confirm that job_mi_id is included in the configuration. Refer to the Databricks CLI documentation for guidance on defining and passing variables correctly.

The script is referencing the variable before it is available: Ensure that job_mi_id is explicitly defined before using it in your script.

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.
Ganesh Gurram 7,295 Reputation points Microsoft External Staff Moderator

2025-03-28T03:06:15.8566667+00:00

@It_trainer

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Accepted answer

1 additional answer

Your answer

Ganesh Gurram 7,295 Reputation points Microsoft External Staff Moderator

2025-03-28T03:06:15.8566667+00:00

@It_trainer

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

@It_trainer

I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer.

Ask: Issues with Automatic CSV Uploads to Databricks Volume Using dbxservice

Solution: The issue occurred when the manually created import folder was deleted, causing the pipeline to not automatically recognize the newly created import folder during execution. As a result, the CSV files were not uploaded as expected. However, specifying the correct path in the pipeline allowed the artifacts to be recognized successfully.

To address this, the following script ensures the import folder exists and properly processes the required files. This script was executed using a pre-configured Docker image.

script:
    - echo "Build Operation"
    - mkdir -p import  # Ensure the import folder exists
    - echo "Host link ${DATABRICKS_HOST}"
    - login  # Log in to Databricks
    - cd import
    - python3.11 ../scripts/write.py  # First Python run 
    - ls -l . # Log the files in the folder
    - echo "Completed"

artifacts:
    paths:
      - import/  # Save the file
    expire_in: "30 days"

This approach ensures that the import folder is correctly created and maintained, allowing the pipeline to successfully detect and upload the required files. Explicitly defining the path in the pipeline configuration resolves the issue, ensuring that the artifacts are available as expected.

If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.

Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

Answer 2

Hi,

I encountered an issue where, after deleting the manually created import folder, the pipeline did not automatically recognize the new import folder created during execution, and the CSV files were not uploaded as expected. However, when I provided the correct path in the pipeline, I was able to see the artifacts. Below is the script I used to resolve the issue, which I ran using a Docker image that I prepared:

script:
    - echo "Build Operation"
    - mkdir -p import  # Ensure the import folder exists
    - echo "Host link ${DATABRICKS_HOST}"
    - login  # Log in to Databricks
    - cd import
    - python3.11 ../scripts/write.py  # First Python run 
    - ls -l . # Log the files in the folder
    - echo "Completed"

artifacts:
    paths:
      - import/  # Save the file
    expire_in: "30 days"

This script creates the import folder and uploads the necessary files. By providing the correct path in the pipeline, I was able to successfully see the artifacts. I hope this information helps!

Venkat Reddy Navari 2,975 Reputation points Microsoft External Staff Moderator

2025-03-28T09:28:42.46+00:00

@It_trainer

I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer.

Share via

Issues with Automatic CSV Uploads to Databricks Volume Using dbxservice

1 additional answer

Your answer