Hi @Saurabh Sharma ,
Thank you for the quick reply.
I figured out what the issue was with this one, unfortunately I have hit another issue.
But let me first explain what resolved this one -
There was another file in the same folder called index.json , this is created by default by the tricklefeed (Microsoft d365 to the data lake) to save metadata on the table load.
Since this json file has one row, when I set FIRT_ROW=1 it errors out because that row does not have the same format as the csv files.
The data in the json file looked like
{"partitions":[{"partitionId":"CASHDISC_00001.csv","startRecId":0,"endRecId":5637149077,"completed":true}],"entityKey":"a28208f3-005a-4d91-817b-1eff94506ce8"}
To get by this I added ,REJECT_TYPE = VALUE ,REJECT_VALUE = 1 to the table creation.
This it rejects that one row in index.json and returns all the data from the CSV.
Let me know if there is a better way to do this?
Now to my current issue -
For one folder X (Calling it X for simplicity)
There are three csv's in the folder
X_1, X_2 and X_3
At one point there was a new column introduced from Dynamics 365 application side, this caused the new column to only come in from csv X_3.
X_1 and X_2 have 40 columns but X_3 has 42 columns from midway.
Example data in X_3 -
1,"Hello",52
2,"Hello1",53
1,"Hello",52,"cc"
1,"Hello",52,"dd"
When I run a select of the external table I get this-
Rows were rejected while reading from external source(s).
7 rows rejected from external table [DimensionAttributeValueCombination_Ext] in plan step 2 of query execution:
Msg 107090, Level 16, State 1, Line 78
107090;Query aborted-- the maximum reject threshold (1 rows) was reached while reading from an external source: 3 rows rejected out of total 3 rows processed. Ensure external data's file format and delimiters/terminators are consistent with source data (and where applicable, that source column counts/types match target).
Is there any way around this?
If I create a view in the serverless SQL pool on the same csv's this works fine , it's smart enough to understand this and not reject rows the but the external table fails.
The entire point of this exercise is that we want data from the lake to be easily accessible from the dedicated SQL pool to join with other tables in the current warehouse. We are stuck on external tables because of this issue.
The alternative is having pipelines which would create a copy of the data and we want to avoid that.