Azure Data Lake & Azure HDInsight Blog
The official blog for the Azure Data Lake services - Azure Data Lake Analytics, Azure Data Lake Store and Azure HDInsight
Using Cask Data Application Platform on Azure HDInsight
Recently, CDAP (Cask Data Application Platform) by Cask, was added to the set of applications that...
Author: Bharath Sreenivas Date: 10/17/2016
Azure Data Lake U-SQL September 2016 Updates: OUTER UNION, Set operations by name, FILE and PARTITION intrinsic functions and more!
I finally found the time to publish the release notes of the September refresh. My apologies for the...
Author: MRys Date: 10/13/2016
Understanding the Data Lake Analytics Unit
Developers often ask us: "What is an Azure Data Lake Analytics Unit? How does it affect my U-SQL...
Author: Saveen Reddy Date: 10/12/2016
Azure Data Lake Analytics: More Compute and More Control
Customers have been telling us that they want access to more computational horsepower for running...
Author: Saveen Reddy Date: 10/12/2016
Experience Updates to the Azure Data Lake Store and Analytics Portal
In this month's refresh of the Azure Data Lake Store and Azure Data Lake Analytics portal, we've...
Author: Saveen Reddy Date: 10/09/2016
HDInsight HBase: How to Improve HBase cluster restart time by Flushing tables?
This blog is written by Nitin Verma, Sr. Software Engineer, HDInsight. Do you restart or re-create...
Author: AshishThapliyal Date: 09/19/2016
Getting started with Azure Data Lake Analytics and Store has never been faster!
We’re happy to announce that we’ve made it much faster to get started with the Data Lake Store and...
Author: Saveen Reddy Date: 09/09/2016
HDInsight HBase: 9 things you must do to get great HBase performance
HBase is a fantastic high end NoSql BigData machine that gives you many options to get great...
Author: AshishThapliyal Date: 09/02/2016
HDInsight -New self-paced trainings and labs
This week Microsoft Learning Experiences released/updated 3 HDInsight courses ( These are free , $49...
Author: AshishThapliyal Date: 08/28/2016
How to register U-SQL Assemblies in your U-SQL Catalog
U-SQL's extensibility model heavily depends on your ability to add your own custom code. Currently,...
Author: MRys Date: 08/26/2016
HDInsight:- Attach additional Azure storage accounts to the cluster
This blog is discontinued in favor of updated HDInsight documentation on MSDN...
Author: AshishThapliyal Date: 08/26/2016
Introducing Image Processing in U-SQL
Rukmani Gopalan - Senior Program Manager Apostolos "Toli" Lerios - Entrepreneur in Residence and...
Author: Rukmani G Date: 08/18/2016
Rapid Big Data Prototyping with Microsoft R Server on Apache Spark: Context Switching & Spark Tuning
Max Kaznady – Data Scientist; Jason Zhang – Senior Software Engineer; Arijit Tarafdar – Senior...
Author: Saveen Reddy Date: 08/09/2016
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
This session was presented by Nitin Verma (Sr. Software Engineer) and Pravin Mittal (Principal...
Author: AshishThapliyal Date: 08/04/2016
Azure Data Lake U-SQL August 1st 2016 Updates: ACLs on Databases, Skipping Header Rows, Sampling and more!
As part of the Azure Data Lake Analytics and U-SQL August 1st refresh, we released a couple of new,...
Author: MRys Date: 08/03/2016
Introducing File and Folder ACLs for Azure Data Lake Store
Overview We’re excited today to announce the availability of File and Folder ACLs for the Azure Data...
Author: Amit R. Kulkarni Date: 07/31/2016
HDinsight - How to use Spark-HBase connector?
Apache Spark is an open-source parallel processing framework that supports in-memory processing to...
Author: Anunay Tiwari Date: 07/25/2016
Azure Data Lake U-SQL July Updates
As part of the Azure Data Lake Analytics and U-SQL July refresh released earlier this month, we...
Author: MRys Date: 07/18/2016
Partial Caching of DataFrame by Vertical and Horizontal Partitioning
The sample Jupyter Scala notebook described in this blog can be downloaded from...
Author: ArijitT Date: 07/08/2016
HDInsight tool in Azure Toolkit for Eclipse is GA!
Today, we are pleased to announce that the HDInsight tool in Azure Toolkit for Eclipse is generally...
Author: Xiaoyong Zhu (MSFT) Date: 07/04/2016
How do I combine overlapping ranges using U-SQL? Introducing U-SQL Reducer UDOs
The problem statement A few weeks ago, a customer on stackoverflow asked for a solution for the...
Author: MRys Date: 06/27/2016
Azure Data Lake Analytics: Greater flexibility with assigning Parallelism to U-SQL Jobs
The Data Lake team is always receiving very useful and thoughtful feedback from users on how we can...
Author: Saveen Reddy Date: 06/17/2016
Appending an Index Column to Distributed DataFrame based on another Column with Non-unique Entries
The sample Jupyter Scala notebook described in this blog can be downloaded from...
Author: ArijitT Date: 06/09/2016
HDInsight Tool for IntelliJ is GA!
We are excited to announce that the HDInsight Tool for IntelliJ is now GA. The HDInsight Tool for...
Author: Saveen Reddy Date: 06/06/2016
Leveraging Azure Data Lake Partitioning to Recalculate Previously Processed Days
Many data flows will require partial reloading of U-SQL tables due to the need to recalculate a...
Author: brimit Date: 05/03/2016
April Updates to Azure Data Lake Analytics and Azure Data Lake Store
Hello everyone, the Azure Data Lake engineering team has been working hard on refining the services...
Author: Saveen Reddy Date: 04/24/2016
Debugging U-SQL Error E_RUNTIME_USER_EXTRACT_UNEXPECTED_NUMBER_COLUMNS: Unexpected number of columns in input record
Did you run into an error that said E_RUNTIME_USER_EXTRACT_UNEXPECTED_NUMBER_COLUMNS with a...
Author: Rukmani G Date: 04/23/2016
HDInsight jobs troubleshooting
WebHCat is a REST interface for remote jobs (Hive, Pig, Scoop, MapReduce) execution. WebHCat...
Author: kolli.kiran Date: 04/21/2016
HDInsight Hive job workload
Typically, HIVE queries are developed using HIVE console or through interactive experiences like...
Author: kolli.kiran Date: 04/19/2016
HDInsight Hive workload under covers
HDInsight under covers post covered cluster creation/set-up overview. Apache Hive is the most...
Author: kolli.kiran Date: 04/06/2016
HDInsight under covers
Azure HDInsight provisions and manages Apache Hadoop clusters in Azure cloud. HDInsight uses...
Author: kolli.kiran Date: 04/04/2016
Saving Spark Streaming Metrics to PowerBI
The sample Jupyter Scala notebook described in this blog can be downloaded from...
Author: ArijitT Date: 04/01/2016
Saving Spark Resilient Distributed Dataset (RDD) To PowerBI
The sample Jupyter Scala notebook described in this blog can be downloaded from...
Author: ArijitT Date: 03/22/2016
U-SQL Programming Improvements to Azure Data Lake Analytics for March 2016
Hello everyone! The Azure Data Lake team is pleased to announce additional enhancements to U-SQL...
Author: Saveen Reddy Date: 03/15/2016
Saving Spark Distributed Data Frame (DDF) To PowerBI
The sample Jupyter Scala notebook described in this blog can be downloaded from...
Author: ArijitT Date: 03/09/2016
Extending Spark with Extension Methods in Scala: Fun with Implicits
The sample Jupyter Scala notebook described in this blog can be downloaded from...
Author: ArijitT Date: 03/01/2016
PySpark: Appending columns to DataFrame when DataFrame.withColumn cannot be used
The sample Jupyter Python notebook described in this blog can be downloaded from...
Author: ArijitT Date: 02/10/2016
Copy data easily from Azure Storage Blobs to Azure Data Lake Store
The Azure Data Lake team has just released capability that helps users to jumpstart their usage of...
Author: Sachin C Sheth Date: 12/15/2015
Organize and discover your big data in the Azure Data Lake with Azure Data Catalog
Enterprise data is growing at a remarkable pace today. A large portion of the growth in data is...
Author: Amit R. Kulkarni Date: 12/10/2015
How To: Increase number of reducers in your Hive/MapReduce job
Our customers often use compression technologies like ORC and Snappy that can compress data and...
Author: Rashim Gupta Date: 12/08/2015
How To: output file as a CSV using Hive in Azure HDInsight
One of the common questions our team gets is how to output a Hive table to CSV. Hive does not...
Author: Rashim Gupta Date: 11/23/2015
Hello world!
Welcome to the official blog of the Azure Data Lake Engineering team. On this blog we will cover the...
Author: Rashim Gupta Date: 11/18/2015