Data Type and Format In External table PolyBase

Palash Aich 21 Reputation points
2021-04-07T16:49:09.937+00:00

Hello there,

I am trying to query Azure Storage Account CDM csv file from SQL server on-prem. I created external table with all the data type as varchar. However, I need to display date time as datetime and need to remove double quote from string while displaying result. What should be the file format should I choose for the same. Please refer screenshot with current field values.

The file format used is below.
CREATE EXTERNAL FILE FORMAT [TextFile]
WITH
(FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR = N',', USE_TYPE_DEFAULT = False))

I am aware there is DATE_FORMAT option, however, I tried with few and it is failing with varchar to datetime conversion error.

Current records that need to modify with right format.
85402-polybasefileformat.jpg

Azure Files
Azure Files
An Azure service that offers file shares in the cloud.
1,156 questions
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
2,666 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,416 questions
Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
197 questions
SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,640 questions
0 comments No comments
{count} votes

Accepted answer
  1. Erland Sommarskog 100.8K Reputation points MVP
    2021-04-09T21:35:06.707+00:00

    You posted another question, which included a longer sample of the data. From that I could tell that there were more datetime values in that file - but the other fields had a different format than the two above in the screenshot. Since DATE_FORMAT is on table level, it seems that you lose here.

    And in any case, the data source may be trusted to always have dates in the same format, or have correct dates.

    I would recommend that you have the columns as varchar in the table. Then create a view on the top of table, where you do a try_convert to datetime2(0) with the appropriate format codes. (You find these these in the topic for CAST and CONVERT in Books Online.) You may also want to expose a raw character column in that view, so that you can easily analyse dates that do not convert.

    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Erland Sommarskog 100.8K Reputation points MVP
    2021-04-07T21:25:32.36+00:00

    Looking that the topic for CREATE EXTERNAL FILE FORMAT, I see that there is a STRING_DELIMITER option. I have never played with external file formats, but I would try this.

    1 person found this answer helpful.
    0 comments No comments

  2. Palash Aich 21 Reputation points
    2021-04-08T11:52:24.433+00:00

    Hi @Erland Sommarskog

    Thanks for your response. I tried with STRING_DELIMITER. It removed extra double quote, however, if I make the data type of a column to date time, query fails with conversion error. Any suggestions on this. If the field is varchar, it works.

    The below format I am using.
    CREATE EXTERNAL FILE FORMAT [CustomFormat]
    WITH
    (
    FORMAT_TYPE = DELIMITEDTEXT
    , FORMAT_OPTIONS
    (
    FIELD_TERMINATOR = N','
    , STRING_DELIMITER = N'"'
    , DATE_FORMAT = N'yyyy-MM-dd HH:mm'
    , USE_TYPE_DEFAULT = False
    )
    )