SQL Server BULK INSERT does not work with Azure Blob Storage emulator (Azurite)

alexandrehtrb 1 Reputation point
2021-01-22T18:27:32.847+00:00

Hello,

I am trying to perform a bulk insert in my SQL Server database, following the documentation here: https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?redirectedfrom=MSDN&view=sql-server-ver15#f-importing-data-from-a-file-in-azure-blob-storage

I am testing it locally, using SSMS v18.4, SQL Server 2019 Developer Edition (Windows 10 Pro x64) and an Azure Storage Emulator (Azurite). When I try to run the bulk insert command, I get the error:

"Referenced external data source "MyAzureBlobStorage" not found."

Even though I successfully created that data source.

The blob storage emulator logs do not indicate any connection attempts when the command runs, although I can connect to emulator using a browser and it appears on the logs.

I wonder if this bug is related to the one reported here: https://feedback.azure.com/forums/217321-sql-database/suggestions/35725408-azure-sql-db-bulk-insert-to-support-blob-store-whe

The blob storage emulator is listening on: https://127.0.0.1:27161 (the default port is 10000, changing ports or using HTTP did not solve the problem)

Azure Storage
Azure Storage
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,542 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
3,202 questions
0 comments No comments
{count} votes

5 answers

Sort by: Most helpful
  1. Denis Robert 11 Reputation points
    2021-07-09T22:38:36.397+00:00

    As usual, Microsoft just abandons issues it doesn't want to resolve. No activity since Feb on this?
    Seems like Azurite is nothing but a trap that Microsoft uses to force you to use the Azure version, even when you're doing local development. No software they produce seems to actually support it. You think for two seconds it might work, then you spend hours down a useless rabbit hole until you give up and use an Azure Storage account so you can move forward with your project, dooming you to have to be connected to Azure at all times.

    What a piece of junk.

    2 people found this answer helpful.
    0 comments No comments

  2. shiva patpi 13,366 Reputation points Microsoft Employee Moderator
    2021-01-22T21:03:58.213+00:00

    Hello @alexandrehtrb ,
    Thanks for your query . I was able to repro your issue locally in my SQL Server Management Studio Express.
    But after following the steps mentioned here:
    https://github.com/microsoft/sql-server-samples/blob/master/samples/features/sql-bulk-load/load-from-azure-blob-storage/LoadFromAzureBlobStorage.sql

    ref: https://stackoverflow.com/questions/43813493/load-csv-from-blob-to-azure-sql-server-referenced-external-data-source-not-fou

    Data got inserted successfully.

    Please note: my test blob had public access so I just skipped the previous 2 steps:

    Overall script used:

    CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
    WITH ( TYPE = BLOB_STORAGE,
    LOCATION = 'https://storageaccoutname.blob.core.windows.net/test'
    );
    BULK INSERT test
    FROM 'test.csv'
    WITH (DATA_SOURCE = 'MyAzureBlobStorage',FIELDTERMINATOR=',',DATAFILETYPE='char',FIRSTROW=2,ROWTERMINATOR='0x0a', TABLOCK);

    When I got the same error: "Referenced external data source "MyAzureBlobStorage" not found."
    I just created a new data source (eg myazureblobstorage1 and provided that name in the second BULK command)

    Couple of things to note:

    1. Try creating the external data source , wait for couple of mins before running the BULK Insert
    2. If the CSV file is not formatted properly , you might run into other issues
    3. Take a note of additional parameters in the BULK Insert command
    4. When I got the error "Cannot obtain the required interface ("IID_IColumnsInfo") from OLE DB provider "BULK" for linked server "(null)" , I just added additional parameters in BULK INSERT Command.
    5. Table column names must match with that of CSV file

    FYI - Also the the bug mentioned by you not related to this particular issue.

    Let me know if you have additional questions. If the above steps works for you - Make sure to "Accept Answer" , so that it helps for the community out there

    0 comments No comments

  3. alexandrehtrb 1 Reputation point
    2021-01-26T21:23:24.85+00:00

    Hello @shiva patpi ,

    Thank you for your response, but it did not work for me...

    I got the error: Bad or inaccessible location specified in external data source "MyAzureBlobStorage".

    I am using Azurite emulator and placed the product.csv file in a "testcontainer" there.

    To create the data source, the SQL command was:

    CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage  
    WITH ( TYPE = BLOB_STORAGE,  
    LOCATION = 'http://127.0.0.1:10000/devstoreaccount1/testcontainer');  
    GO  
    

    The Product table and bulk insert command were copied from the GitHub sample: LoadFromAzureBlobStorage.sql

    Did it work for you using an emulator?


  4. Raghunandan Venkateswaran 16 Reputation points
    2021-07-16T11:51:18.477+00:00

    Does this fix work with a non public access container / storage

    0 comments No comments

  5. Chris Stefano 0 Reputation points
    2023-09-08T08:58:45.5333333+00:00

    I am using the Azure SQL Edge (mcr.microsoft.com/azure-sql-edge:latest) together with Azurerite (mcr.microsoft.com/azure-storage/azurite:latest) for local development purposes using Docker Compose.

    # docker-compose.yml
    ---
    services:
    
      # Azure SQL Edge
      sqlserver:
        image: mcr.microsoft.com/azure-sql-edge:latest
        hostname: sqlserver
        cap_add:
          - SYS_PTRACE
        init: true
        environment:
          ACCEPT_EULA: "Y"
          MSSQL_SA_PASSWORD: "${MSSQL_SA_PASSWORD}"
          MSSQL_PID: Developer
        ports:
          - 1433:1433    # for Azure Edge SQL Service
          - 10000:10000  # for Azurerite Blob Service
        volumes:
          - db_data:/var/opt/mssql
    
      # Azurerite
      blobstore:
        image: mcr.microsoft.com/azure-storage/azurite:latest
        command: >
          azurite-blob \
            --blobHost 0.0.0.0 \
            --blobPort 10000 \
            --disableProductStyleUrl \
            --debug /tmp/debug.log
        # NOTE: joins same network as sqlserver container for access via localhost     
        network_mode: service:sqlserver
        volumes:
          - blob_data:/azurite:/data
    
    volumes:
      db_data: {}
      blob_data: {}
    
    

    Note: The Azurerite container is configured to use the same network as Azure SQL Edge, so it's accessible using localhost from the SQL server, and I've explicitly set the Azure Edge container hostname to sqlserver (to resolve connection issues when connecting from the host machine).

    I have a created a public blob container called foo in Azurite Blob Storage and defined external data sources in Azure SQL Edge as follows.

    create external data source test_with_ip_address
    with (
        type = blob_storage,
        location = 'http://127.0.0.1:10000/devstoreaccount1/foo'
    );
    go
    
    create external data source test_with_localhost
    with (
        type = blob_storage,
        location = 'http://localhost:10000/devstoreaccount1/foo'
    );
    go
    
    create external data source test_with_container_hostname
    with (
        type = blob_storage,
        location = 'http://sqlserver:10000/devstoreaccount1/foo'
    );
    go
    
    create external data source test_with_invalid_location
    with (
        type = blob_storage,
        location = 'this is not a valid location!'
    );
    go
    

    When trying to issue a BULK INSERT ... using any of the external data sources, I receive a Bad or inaccessible location specified in external data source <external_data_source_name> error.

    create table test ( id int );
    go
    
    bulk insert test
    from 'blob-of-id-values'
    with ( data_source = 'test_with_ip_address' );
    go
    
    bulk insert test
    from 'blob-of-id-values'
    with ( data_source = 'test_with_localhost' );
    go
    
    bulk insert test
    from 'blob-of-id-values'
    with ( data_source = 'test_with_container_hostname' );
    go
    
    bulk insert test
    from 'blob-of-id-values'
    with ( data_source = 'test_with_invalid_location' );
    go
    

    I've also tried using a private blob container together with SAS credentials, but get the same error.

    Any help is appreciated!


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.