How to fix the copy activity error while copying data from databricks delta table to datalake in csv format

Subhadip Roy 31 Reputation points
2024-07-11T07:54:48.48+00:00

There are some error tables in Databricks delta table . Those tables need to extracted as csv and load in azure data lake , inside the folder of the container.

Staging has been enabled in the copy activity since it is 2 step process.

Approx row count of the tables - 50k.

While the copy activity runs , it fails with the error .

ErrorCode=AdlsGen2OperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ADLS Gen2 operation failed for: Only 'http' and 'https' schemes are allowed.

Parameter name: value. Account: 'amledpstoragedev'. FileSystem: 'edp-dev'. Path: 'bronze/temp/9aeaa750-ea7c-40e0-8b76-eefbad013ae0/AzureDatabricksDeltaLakeExportCommand'..,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.ArgumentException,Message=Only 'http' and 'https' schemes are allowed.

Parameter name: value,Source=System.Net.Http,'

could you please advise to resolve the issue.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,478 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,804 questions
{count} votes

Accepted answer
  1. Nehruji R 8,066 Reputation points Microsoft Vendor
    2024-07-12T11:42:12.3733333+00:00

    Hello Subhadip Roy,

    Greetings! Welcome to Microsoft Q&A Platform.

    I understand that you’re encountering an issue with the copy activity in Azure Data Factory (ADF) when trying to extract error tables from Databricks Delta tables, convert them to CSV, and load them into Azure Data Lake, the error message you’re encountering indicates that the URL scheme being used is not valid. Azure Data Lake Storage Gen2 (ADLS Gen2) only supports http and https schemes.

    Please verify which authentication method is used to connect to ADLS Gen2 and if it is service principal, check whether it has necessary permissions to access the particular folder in error message (please refer this document).

    Ensure that the URL you are using in your configuration is correctly formatted with https also in Azure Data Factory, check the linked service configuration for your ADLS Gen2. Ensure that the URL is correctly specified and uses the https scheme and confirm that the URL you’re using starts with either http:// or https://. If you’re using a different scheme, such as ftp:// or file://, you’ll need to modify it to one of the supported schemes.

    Ensure that your Azure Storage account allows access from Azure services. You might need to enable the “Allow trusted Microsoft services to access this storage account” option in the storage account firewall settings. Check the firewall settings of your ADLS Gen2 storage account to ensure that the IP addresses of the ADF integration runtime are allowed.

    If the self-hosted integration runtime uses a proxy server, check the Azure Storage firewall settings to ensure that the IP address of the proxy server is allowed. If it is not allowed, add the IP address to the firewall settings refer article - https://learn.microsoft.com/en-us/azure/data-factory/azure-integration-runtime-ip-addresses.

    In addition, please make sure that the below troubleshooting steps have been followed: https://learn.microsoft.com/en-us/azure/data-factory/connector-troubleshoot-azure-data-lake#azure-data-lake-storage-gen2, Troubleshoot the Azure Data Lake Storage connectors in Azure Data Factory and Azure Synapse

    Hope this answer helps! please let us know if you have any further queries. I’m happy to assist you further.


    Please "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.


1 additional answer

Sort by: Most helpful
  1. Amira Bedhiafi 25,866 Reputation points
    2024-07-11T08:50:32.8433333+00:00

    Can you verify if your Linked Service in ADF for ADLS Gen2 is correctly configured with the proper URL scheme (https) ?

    It should look something like this:

    
    https://<account_name>.dfs.core.windows.net
    
    

    Here's an example JSON configuration snippet for the copy activity, ensuring that https is used:

    
    {
    
        "name": "CopyDeltaToADLS",
    
        "type": "Copy",
    
        "typeProperties": {
    
            "source": {
    
                "type": "DeltaLakeSource"
    
            },
    
            "sink": {
    
                "type": "DelimitedTextSink",
    
                "storeSettings": {
    
                    "type": "AzureBlobFSWriteSettings",
    
                    "container": "your-container-name",
    
                    "path": "bronze/temp/",
    
                    "formatSettings": {
    
                        "type": "DelimitedTextFormat",
    
                        "columnDelimiter": ",",
    
                        "rowDelimiter": "\n"
    
                    }
    
                }
    
            },
    
            "enableStaging": true,
    
            "stagingSettings": {
    
                "linkedServiceName": {
    
                    "referenceName": "StagingLinkedService",
    
                    "type": "LinkedServiceReference"
    
                },
    
                "path": "staging/temp/"
    
            }
    
        },
    
        "linkedServiceName": {
    
            "referenceName": "AzureDataLakeLinkedService",
    
            "type": "LinkedServiceReference"
    
        }
    
    }
    
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.