Design a PolyBase data loading strategy for dedicated SQL pool in Azure Synapse Analytics

Shivendoo Kumar 746 Reputation points
2021-10-06T06:16:51.443+00:00

Hi All,

I am working on creating an External table in Synapse Dedicated Pool from ADLS Gen2. I am following these steps:

  1. Using Synapse Data Flow to connect to On-Premise source MS Dynamics Database and create CSV files (daily File for each table with Suffix YYYY_MM_DD) in ADLS Gen2.
  2. Then Create an EXTERNAL table in Synapse Dedicated Pool from ADLS Gen2.
  3. Create SP in Synapse to Merge data to ODS and call the SP in Data Flow

I am struggling with Step 2 where I am trying to create an External Table.

CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<Your Password>';

--DROP DATABASE SCOPED CREDENTIAL MSI_ADLSCRED
CREATE DATABASE SCOPED CREDENTIAL MSI_ADLSCRED WITH IDENTITY = 'Managed Service Identity';

--DROP EXTERNAL DATA SOURCE [DS_ADLS_FileDropZone]
CREATE EXTERNAL DATA SOURCE [DS_ADLS_FileDropZone] WITH
( TYPE = HADOOP,
LOCATION = N'abfss://<Container Name>@<Account Name>.dfs.core.windows.net',
CREDENTIAL = MSI_ADLSCRED
);

CREATE EXTERNAL FILE FORMAT [FF_CSV_FR2]
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = N',',
STRING_DELIMITER = N'"',
FIRST_ROW = 2,
USE_TYPE_DEFAULT = True
)
)
GO
CREATE SCHEMA Extern;

CREATE EXTERNAL TABLE [Extern].[Fact_CustomerTransactions]
(
[TransactionDate] nvarchar NULL,
[CustomerAccount] nvarchar NULL,
[Class] nvarchar NOT NULL,
[LoyaltyCardID] nvarchar NULL,
[SoldIncGST] numeric NULL,
[GrossMargin] numeric NULL,
[SoldQTY] numeric NULL,
[TransactionCount] decimal NULL,
[ETL_DateTime] nvarchar NOT NULL,
[FileCreatedDateTime] nvarchar NOT NULL
)
WITH (
LOCATION='Fact_CustomerTransactions/',
DATA_SOURCE = DS_ADLS_FileDropZone,
FILE_FORMAT = FF_CSV_FR2,
REJECT_TYPE = VALUE,
REJECT_VALUE = 0
);
GO

and When I try to create an External Table I am getting an error message "External file access failed due to internal error: 'Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_IsDirExist. Java exception message: HdfsBridge::isDirExist - Unexpected error encountered checking whether directory exists or not: AbfsRestOperationException: Operation failed: "This request is not authorized to perform this operation.", 403, HEAD, ?upn=false&action=getAccessControl&timeout=90'"

Please can you help me

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,134 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Shivendoo Kumar 746 Reputation points
    2021-10-06T06:24:01.74+00:00

    Hi All,

    I found the issue and resolved it.

    It was access related issue. Synapse Workspace was not able to access ADLS and once I added Synapse Managed Service Identity to ADLS IAM, It worked

    I followed step-4 from this link and it fixed the issue. https://learn.microsoft.com/en-us/azure/azure-sql/database/vnet-service-endpoint-rule-overview?toc=%2Fazure%2Fsql-data-warehouse%2Ftoc.json

    Under your storage account, go to Access Control (IAM), and select Add role assignment. Assign the Storage Blob Data Contributor Azure role to the server or workspace hosting your dedicated SQL pool, which you've registered with Azure AD.

    https://techcommunity.microsoft.com/t5/azure-synapse-analytics/msg-10519-when-attempting-to-access-external-table-via-polybase/ba-p/690641

    https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/load-data-overview


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.