Some highlights of the Summer 2017 refreshes for Azure Data Lake Analytics and U-SQL: GZip on OUTPUT, Catalog Views, DiagnosticStream in .NET user code, Major updates to Cognitive Libraries, Tool support to create your EXTRACT statement
Instead of providing a detailed list, which you can find on the Azure Data Lake blog post which in turn will take you to the detailed release notes on our Azure Data Lake GitHub site, I will focus on some of my personal favorites.
EXTRACT expression has always been able to auto-uncompress GZIP files with the
.gz extension, we now also automatically GZIP compress files when the target of the
OUTPUT statement is a file (or set of files) with the
.gz extension. This feature is currently in preview, please go and check it out!
Speaking of preview, we added another option to make your file set processing even faster (and more is in the pipeline!). I recommend that you add the following two preview options to your script if you are operating on file sets:
SET @@FeaturePreviews = "FileSetV2Dot5:on, AsyncCompilerStoreAccess:on";
And another preview feature is the ability to write your own diagnostic information from within your .NET/C# user code directly into the Azure Data Lake storage account. If you ever wished to record your user-defined operator's or function's runtime behavior for later analysis, or catch errors and analyze them later, this is the feature to use!
Have you ever wondered how to generate a script to manage your meta-data (e.g., delete all your test databases) or wanted to find specific meta data objects based on some predicate? I certainly have. So I am excited that we now expose U-SQL catalog views. They are modelled after the SQL Server catalog views, but live in the
usql built-in schema (e.g., the catalog view for tables is called
The recent refresh of the cognitive extension libraries brought a much richer experience to the cognitive libraries! Now I can operate on text and images either directly on files, thus not run into the string and row size limits, or in rowsets (if I need to scale to millions), but I also get the information with a confidence value.
Last, but definitively not least (!) the ADL Tooling in VisualStudio also simplifies generation of the EXTRACT expression for CSV-like data by automatically detecting the schema, letting me adjust it and then use the generated expression in my script! This is so cool that we published a detailed blog post on it!
There is much more than the highlights above. Also stay tuned for the October refresh, which will contain some additional cool and long awaited features!