Azure Blob SDK (Python azure-storage-blob) does not parse rows with tab separated columns in txt file

Question

Azure Blob SDK (Python azure-storage-blob) does not parse rows with tab separated columns in txt file

Volochy Grigory 16

I have .txt files pushed by Microsoft Academic Graph to Azure Blob storage.

And I'm building a python app that uses "azure-storage-blob" SDK for querying the .txt files to search certain entries by column values. For this, I'm using the following documentation:
https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-query-acceleration-how-to?tabs=python%2Cpowershell

I tested it for .csv files - and it works just fine: the columns are searchable by using the "query_blob" method of the "BlobClient" class. The files have columns in each row that are separated by comma sign ','

But when I'm trying to use it for those .txt files that have columns separated by the '\t' sign. Then in response to the query, I'm getting each row as a single column.
For example, if the file contains a row like:

95198407 14607 helpage international HelpAge International

Then, I expect to get all for columns searchable and get in response an object with four columns as it working for similar .csv files.
But instead of that, I'm getting a single row in response as a single column.

The live example of what I have in code:

And what I have in response:

I made multiple tests with parameter "delimiter" queal to:
'\t'
'\t'
'/\t'
'\t\t\t\t'

And similar to those. But all time the result is either the same or some time it throws an error like:

Then I tried to set parameter "delimiter" to '\t\t\t\tt' and got the following response:

So, it looks like it does not matter how many '\t' signs I'm specifying for the "delimiter" parameter, they all are filtering out and the columns are treated as 't' characters separated in this case.

And it looks like I either can not figure out how to escape the '\t' sign properly and that is why it is filtering out and ignoring, or there is some another way to specify the 'tab separator". I checked the docs for the BlobClient class here:
https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobclient?view=azure-python

And even looked inside the source code, but can't figure out how to solve the issue.

Sumarigo-MSFT 47,511 Reputation points Microsoft Employee Moderator

2020-10-23T12:04:14.113+00:00

@Volochy Grigory since the request body doesn’t have column separator when we set as ‘\t’. I will investigate and get back to you!
Volochy Grigory 16 Reputation points

2020-10-23T12:15:27.727+00:00

Thank you! I'm looking forward to your reply.
If you need any additional information about the way I'm getting the issue or something is not clear in my description, then please let me know, I will provide it as soon as possible.
Volochy Grigory 16 Reputation points

2020-10-26T17:08:49.817+00:00

@Sumarigo-MSFT From what I understood, the problem is with the way the parameter is passed in XML format. And this error that says "The specified XML is not syntactically valid" when I try to escape the "\ t" sign as "\ t".
I tried various ways to avoid this but to no avail.
I hope you can clarify how to fix this problem.
Tim Cahill 31 Reputation points

2020-10-28T00:05:15.127+00:00

@Sumarigo-MSFT I also keen to have an answer to this issue - looking forward to your reply.
Anonymous

2020-10-29T22:02:56.2+00:00

@Volochy Grigory Just letting you know that we are still working on this internally. Sumarigo or I will provide an update once we have more information available.
Volochy Grigory 16 Reputation points

2020-10-30T06:09:48.457+00:00

I understand this issue is not simple to fix.
Thank you so much for the update. It is important for me to know that the issue is supposed to be solved.