Share via

Documentation on Binary.InferContentType(Source)[Csv.PotentialDelimiters]

Patrick O'Beirne 26 Reputation points
2021-01-20T09:44:46.433+00:00

All the documentation says about InferContentType is "If the inferred content type is text/csv, and the format is delimited, additionally returns field Csv.PotentialDelimiter containing a table for analysis of potential delimiters." It gives no explanation of all the codes in the table, and the last two are strange.

Csv.PotentialDelimiters returns a table with the following (pictured) Potential Delimiter characters. I have added a column Code to the left to show the ASC code of each. The first five, up to the pipe character, are all well understood. But why are chr(1) and chr(87) there? What files have fields separated or delimited by chr(1) or "W"?

Is there a bug in the characters displayed, perhaps only the little end of Unicode characters is being returned? For example, in Japanese the comma is U+3001, so there's the 01.

58358-image.png

Community Center | Not monitored
0 comments No comments

Answer accepted by question author

Ehren (MSFT) 1,786 Reputation points Microsoft Employee
2021-01-22T21:55:40.717+00:00

This function is used internally by PQ. "W" represents "split by whitespace", which corresponds to passing an empty text value "" to Csv.Document. I have no idea why 0x0001 is part of the default delimiter list, but there must be some historical reason for it.

Was this answer helpful?

1 person found this answer helpful.
0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Patrick O'Beirne 26 Reputation points
    2021-01-25T10:56:31.377+00:00

    Thank you very much Ehren.
    I tested and found that SOH / CHAR(1) is indeed recognised as a separator.
    Google tells me it is used in Hadoop for a field separator .

    For Whitespace , I tried space and that is recognised as W, but nonbreaking space (160) is not.
    So in code I can replace "W" by " "

    I appreciate the reply,
    Patrick

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.