Issues with Automatic CSV Uploads to Databricks Volume Using dbxservice

It_trainer 20 Reputation points
2025-03-27T10:32:05.64+00:00

I initially created an import folder manually, which successfully stored CSV files as volumes in Databricks. After deleting this manually created import folder, I expected the pipeline to automatically recognize the new import folder created during the pipeline execution and process the CSV files as artifacts. However, the pipeline is not uploading the CSV files to the Databricks volume as expected. Additionally, I encountered an error related to an undefined variable job_mi_id when trying to use the Databricks CLI command to copy files.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
{count} votes

Accepted answer
  1. Venkat Reddy Navari 2,975 Reputation points Microsoft External Staff Moderator
    2025-03-28T09:36:37.8833333+00:00

    @It_trainer

    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer

    Ask:  Issues with Automatic CSV Uploads to Databricks Volume Using dbxservice

    Solution: The issue occurred when the manually created import folder was deleted, causing the pipeline to not automatically recognize the newly created import folder during execution. As a result, the CSV files were not uploaded as expected. However, specifying the correct path in the pipeline allowed the artifacts to be recognized successfully.

    To address this, the following script ensures the import folder exists and properly processes the required files. This script was executed using a pre-configured Docker image.

    script:
        - echo "Build Operation"
        - mkdir -p import  # Ensure the import folder exists
        - echo "Host link ${DATABRICKS_HOST}"
        - login  # Log in to Databricks
        - cd import
        - python3.11 ../scripts/write.py  # First Python run 
        - ls -l . # Log the files in the folder
        - echo "Completed"
    
    artifacts:
        paths:
          - import/  # Save the file
        expire_in: "30 days"
    
    

    This approach ensures that the import folder is correctly created and maintained, allowing the pipeline to successfully detect and upload the required files. Explicitly defining the path in the pipeline configuration resolves the issue, ensuring that the artifacts are available as expected.

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information. 

    If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue. 

     

    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. It_trainer 20 Reputation points
    2025-03-28T08:09:15.1866667+00:00

    Hi,

    I encountered an issue where, after deleting the manually created import folder, the pipeline did not automatically recognize the new import folder created during execution, and the CSV files were not uploaded as expected. However, when I provided the correct path in the pipeline, I was able to see the artifacts. Below is the script I used to resolve the issue, which I ran using a Docker image that I prepared:

    script:
        - echo "Build Operation"
        - mkdir -p import  # Ensure the import folder exists
        - echo "Host link ${DATABRICKS_HOST}"
        - login  # Log in to Databricks
        - cd import
        - python3.11 ../scripts/write.py  # First Python run 
        - ls -l . # Log the files in the folder
        - echo "Completed"
    
    artifacts:
        paths:
          - import/  # Save the file
        expire_in: "30 days" 
    

    This script creates the import folder and uploads the necessary files. By providing the correct path in the pipeline, I was able to successfully see the artifacts. I hope this information helps!


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.