Files and File Sets as Inputs and Outputs (U-SQL)

Summary

U-SQL as a Big Data processing language can operate over unstructured data such as files and set of files as well as over structured data that has been stored in form of tables. The processed results of U-SQL queries can then be written back into “unstructured” files or stored in structured tables.

A quick note on the term “unstructured data”: This data often has internal structure such as a CSV format, a JSON or XML structure. However, from the point of view of the language it is unstructured, because it is stored in a file as byte stream and has no metadata stored in a place that is accessible or understood by the language processor. While this makes it impossible for the query processor to know what the structure is and to optimize the data processing accordingly, it provides more flexibility and agility to the data processing and gives the user the ability to provide a late-bound schema that is appropriate to the processing at-hand, instead of having a pre-determined schema and interpretation of the data.

In this section we focus on the unstructured data as input and output. Please refer to the section on U-SQL tables, SELECT expressions and INSERT for more information about tables as input and output.

See Also