Using Synapse to ingest data. Only csv. I can view data, but querying returns no records.

Question

New to Synapse.... I am trying to ingest data from an on-prem sql server using a pipeline in Synapse.

I am using the copy data task. My source seems to work fine, I can view data, connection test fine. The source is the issue. I have had some success with csv files. I can run the copy activity with success. On the data hub, I can see my file. I can right click and select preview and see data. When I right click and select, new sql script, select top 10000 rows I get a query editor and some sql. I run the sql and get no records returned.

When I follow the exact same procedure, but use a parquet file type. My pipeline fails. why would it work for csv, but not parquet?

Do I need to load into my synapse workspace and then to the data lake? The fact that I see two directories in my storage is confusing. I see a few files in both places and one in only the workspace. Is there documentation and when to use the data lake and when to use the workspace? and how the two interact?

Answer

I figured out why I can not use a parquet file because I don't have the proper drive on the SQL server machine. So I will try to fix that later.

I have created a pipeline and it runs successfully. On the Data hub, under linked services, I can click on my storage, go into my folder and see my csv file. I can right click, preview and see data.

I right clicker and select "select top 100" and I get a script like..

SELECT
    TOP 100 *
FROM
    OPENROWSET(
        BULK 'https://x._d0312a15-8bba-4369-9ae2-f82dac15c064_c884101b-fcc6-484b-96d1-6c69608eb1a3.csv',
        FORMAT = 'CSV',
        PARSER_VERSION = '2.0'
    ) AS [result]

this fails, so I added a with clause as the error sugest..

SELECT
    TOP 100 *
FROM
    OPENROWSET(
        BULK 'https://x/Raw/*.csv',
        FORMAT = 'CSV',
        PARSER_VERSION = '2.0'
    ) 
    WITH 
    (
    UserID int,
    UName varchar(64),
    UTypId int,
    FName varchar(32),
    MidInit varchar(4),
    LName varchar(32)
    ) AS [result];

The query seems to run, it returns the column headers, but no records. I did tweak the original url for the data file to hide it.

I am fairly sure my permissions are set up correctly. in the data hub, I right click... manage access and I am the a user with read, write and execute. I am also the owner with read, write and execute.

Hope this additional info helps.

Answer

Hi @Computer Mike ,
It seems that each row in your CSV doesn't have the correct number of fields, so it cannot be parsed. You can resolve this by amending your .CSV file to have equal number of comma separated fields in each of the rows. Please check the file should not be corrupted or data should not be in wrong format. If this is not the case then, I would request you to kindly share the screenshot of the data you are trying to fetch.
I tried to repro your scenario and did not get any error.

Hope this will help. Please let us know if any further queries.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you.
Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Share via

Using Synapse to ingest data. Only csv. I can view data, but querying returns no records.

2 answers