Polybase s3 not working on SQL Server 2022 Linux

Daniel Hartl 21 Reputation points
2022-06-06T18:53:24.237+00:00

Trying to connect to parquet files on S3 from SQL server 2022 Linux fails with the following error:

CREATE EXTERNAL DATA SOURCE myds
WITH
( LOCATION = 's3://mybucket'
, CREDENTIAL = mycreds
);
GO

111631;The specified parameter for external data source scheme (s3) is not valid.

According to the documentation here, this should be supported. Any idea when this will be enabled?

https://learn.microsoft.com/en-us/sql/linux/sql-server-linux-release-notes-2022?view=sql-server-ver16
https://learn.microsoft.com/en-us/sql/relational-databases/polybase/polybase-configure-s3-compatible?view=sql-server-ver16
https://learn.microsoft.com/en-us/sql/relational-databases/polybase/polybase-virtualize-parquet-file?view=sql-server-ver16

SQL Server | Other
0 comments No comments
{count} votes

Accepted answer
  1. Hugo Queiroz 81 Reputation points Microsoft Employee
    2022-06-07T21:50:41.537+00:00

    Daniel, for S3-Polybase on Linux as of right now you'll need TF13702 to work. This will be removed next CTP.

    Add the TF and let us know of it works.


2 additional answers

Sort by: Most helpful
  1. Seeya Xi-MSFT 16,586 Reputation points
    2022-06-07T03:00:38.99+00:00

    Hi @Daniel Hartl ,

    Welcome to Microsoft Q&A!
    Although I know you've read the documentation, I'd like to ask that did you check the prerequisites as follows:

    • Install the PolyBase feature for SQL Server.
    • Install SQL Server Management Studio (SSMS) or Azure Data Studio.
    • S3-compatible storage.
    • An S3 bucket created. Buckets cannot be created or configured from SQL Server.
    • A user () has been configured and the secret () and that user is known to you. You will need both to authenticate against the S3 object storage endpoint.Access Key IDSecret Key ID
    • ListBucket permission on S3 user.
    • ReadOnly permission on S3 user.
    • TLS must have been configured. It is assumed that all connections will be securely transmitted over HTTPS not HTTP. The endpoint will be validated by a certificate installed on the SQL Server OS Host.

    In addition, you can try LOCATION = 's3://<ip_address>:<port>/' and see what the result will be.

    Best regards,
    Seeya


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  2. Daniel Hartl 21 Reputation points
    2022-06-08T19:13:27.313+00:00

    Thanks this worked. I am able to read single parquet files now.

    However I'm unable to read parquet files in a folder structure. It fails with the following error:
    External table '<tablename>' is not accessible because content of directory cannot be listed.

    When I manually check using 'aws s3 ls <s3path>' with the same credentials it returns values though. Is there a way to turn on more verbose logging to debug this further?

    Thanks,
    Daniel


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.