Azure Data Lake & Azure HDInsight Blog
The official blog for the Azure Data Lake services - Azure Data Lake Analytics, Azure Data Lake Store and Azure HDInsight
Azure Data Lake Tools for Visual Studio Code (VSCode) General Availability
Azure Data Lake Tools for Visual Studio Code (VSCode) gives developers a light but powerful code...
Author: JennyJiang Date: 05/12/2017
Using Jupyter notebooks and Pandas with Azure Data Lake Store
This blog post describes how to use Jupyter notebooks and Pandas with Azure Data Lake Store. Using...
Author: Amit R. Kulkarni Date: 05/05/2017
SCP.Net with HDInsight Linux Storm clusters
SCP.Net is now available on HDInsight Linux clusters 3.4 and above. Versions Note: HDInsight Storm...
Author: Ravi Peri (MSFT) Date: 05/03/2017
HDInsight tools for IntelliJ & Eclipse April Updates
We are pleased to announce the April updates of HDInsight Tools for IntelliJ & Eclipse. This is...
Author: JennyJiang Date: 04/27/2017
Azure Data Lake U-SQL April 25 2017 Updates: Introducing Packages, UNPIVOT INCLUDE NULLS, fast file set preview flag, R extension returns dataframes, exporting your cluster database with sample data to your local run and more!
We have concluded the rollout of our April 2017 refresh to all the regions today. Here are the April...
Author: MRys Date: 04/27/2017
Exposing Hive!
I sat down with Justin Scott (Application Development Manager at Microsoft working with our top...
Author: AshishThapliyal Date: 04/25/2017
Cloudera clusters now run with Azure Data Lake Store
We are excited to announce that with today's release of Cloudera Enterprise 5.11 you can now run...
Author: CP_MSFT Date: 04/18/2017
Use H2O.ai on Azure HDInsight
We're hosting an upcoming webinar to present you how to use H2O on HDInsight and to answer your...
Author: Xiaoyong Zhu (MSFT) Date: 04/11/2017
Azure Data Factory makes it even easier and convenient to uncover insights from data when using Data Lake Store with SQL Data Warehouse
Earlier in February 2017 we announced availability of SQL Data Warehouse (SQLDW) PolyBase support...
Author: Sachin C Sheth Date: 04/08/2017
Azure HDInsight 3.6 - Five things that will make a data developer happy
Working with Hive, I regularly find myself staring at a csv/tsv/json files wondering where to...
Author: AshishThapliyal Date: 04/06/2017
Hive Metastore in HDInsight –Tips, Tricks & Best Practices
When you create a Hive table, the table definition (column names, data types, comments, etc.) are...
Author: AshishThapliyal Date: 03/24/2017
How to use BigDL on Apache Spark for Azure HDInsight
Deep learning is impacting everything from healthcare, transportation, manufacturing, and more....
Author: Xiaoyong Zhu (MSFT) Date: 03/17/2017
Azure Data Lake U-SQL March 9 2017 Updates: Deprecations turn into errors, PIVOT/UNPIVOT, cross ADLS account U-SQL catalog sharing, nuget packages and more!
After mainly internal service updates after our general availability, we released several new U-SQL...
Author: MRys Date: 03/16/2017
Using Custom Python Libraries with U-SQL
The U-SQL/Python extensions for Azure Data Lake Analytics ships with the standard Python libraries...
Author: Saveen Reddy Date: 03/10/2017
Analyze your data in ADLS with more assurance with the recently GA'd Power BI Desktop connector
As you know, Azure Data Lake Store (ADLS) has customers, who analyze/view data stored in ADLS...
Author: Sachin C Sheth Date: 03/10/2017
How WebHCat Works and How to Debug (Part 2)
Link to Part 1 2. How to debug WebHCat 2.1. BadGateway (HTTP status code 502) This is a very generic...
Author: jiangmouren Date: 03/08/2017
How WebHCat Works and How to Debug (Part 1)
- Overview and Goals One of the common scenarios our customers facing are: why my Hive, Pig, or...
Author: jiangmouren Date: 03/08/2017
Azure Data Lake Tools for VSCode (Preview) - March Update
Continue our journey to launch Azure Data Lake Tools for VSCode for better cross-platform support,...
Author: JennyJiang Date: 03/07/2017
Garbage Collection and its performance impact
Hadoop is a beautiful abstraction that allows us to deal with the numerous complexities of data...
Author: Ranjan Banerjee Date: 03/06/2017
Wiring your older Hadoop clusters to access Azure Data Lake Store
This blog post describes how to connect older Hadoop clusters, those with version lower than 3.0, to...
Author: Amit R. Kulkarni Date: 02/27/2017
Restarting Storm EventHub Topology on a new cluster
Azure EventHub is a popular highly scalable data streaming platform. More about Azure EventHub can...
Author: Ranjan Banerjee Date: 02/24/2017
Using Oozie SLA on HDInsight clusters
Introduction Often we have several jobs running on our HDInsight clusters that have tight timelines...
Author: Bharath Venkatesh Date: 02/24/2017
Ingest data into Azure Data Lake Store with StreamSets Data Collector
Today, I want to give a shout out to one of our partners who has a great offering for Azure Data...
Author: CP_MSFT Date: 02/23/2017
Making Azure Data Lake Store the default file system for Hadoop
Here's an article that explains how to make Azure Data Lake Store the default file system for...
Author: Amit R. Kulkarni Date: 02/21/2017
Enabling U-SQL Advanced Analytics for Local Execution
After we announced the ability for U-SQL to massively distributed Python code in the Azure Data...
Author: Saveen Reddy Date: 02/20/2017
Connecting your own Hadoop or Spark to Azure Data Lake Store
A frequent question we get is how do I connect my Hadoop or Spark cluster to Azure Data Lake Store....
Author: Amit R. Kulkarni Date: 02/17/2017
Building advanced analytical solutions faster using Dataiku DSS on HDInsight
The Azure HDInsight Application Platform allows users to use applications that span a variety of use...
Author: Bharath Sreenivas Date: 02/16/2017
HDinsight - How to perform Bulk Load with Phoenix ?
Apache HBase is an open Source No SQL Hadoop database, a distributed, scalable, big data store. It...
Author: Anunay Tiwari Date: 02/14/2017
Uncover insights rapidly from petabytes of data in Azure Data Lake Store with SQL Data Warehouse PolyBase support
Most common patterns using Azure Data Lake Store (ADLS) involve customers ingesting and storing raw...
Author: Sachin C Sheth Date: 02/06/2017
Distributed Deep Learning on HDInsight with Caffe on Spark
Introduction Deep learning is impacting everything from healthcare to transportation to...
Author: Xiaoyong Zhu (MSFT) Date: 02/02/2017
U-SQL Deprecation Update: Migration of Data Source Credentials and Removal of CREATE CREDENTIAL, ALTER CREDENTIAL and DROP CREDENTIAL
Back in October, we announced that we simplified the U-SQL Credentials by merging the password...
Author: MRys Date: 01/24/2017
U-SQL Deprecation notice: PARTITION BY BUCKET will be removed
Hi all In the upcoming refresh, we are removing the deprecated syntax PARTITION BY BUCKET and will...
Author: MRys Date: 01/23/2017
Introducing: Microsoft Azure Data Lake Tools for Visual Studio Code
Welcome to the Microsoft Azure Data Lake Tools preview for Visual Studio Code, an extension for...
Author: JennyJiang Date: 01/20/2017
Microsoft Azure Data Lake Tools for Visual Studio Code
Welcome to the Microsoft Azure Data Lake Tools preview for Visual Studio Code, an extension for...
Author: JennyJiang Date: 01/20/2017
HDInsight tools for IntelliJ & Eclipse December Updates
We are pleased to announce the December updates of HDInsight Tools for IntelliJ & Eclipse. The...
Author: JennyJiang Date: 01/20/2017
Spark Job Submission on HDInsight 101
This article is part two of the Spark Debugging 101 series we initiated a few weeks ago. Here we...
Author: Bharath Venkatesh Date: 01/06/2017
Cornell Lab of Ornithology Improves Machine Learning Workflow with Azure HDInsight
For the last 14 years, the Cornell Lab of Ornithology has been collecting millions of bird...
Author: Rashim Gupta Date: 12/28/2016
Introducing: Interactive Hive cluster using LLAP (Long Live and Process)
Earlier in the Fall, we announced the public preview of Hive LLAP (Long Live and Process) in the...
Author: Rashim Gupta Date: 12/28/2016
Spark Debugging on HDInsight 101
Apache Spark is an open source processing framework that runs large-scale data analytics...
Author: Abdullah Al Mahmood Date: 12/19/2016
Problems with new File Set (update 2016-12-14 - 16:30 PST)
In the latest push we enabled the new faster file set feature per default. Unfortunately that caused...
Author: MRys Date: 12/14/2016
Introducing Python SDKs for Data Lake Store & Analytics
We are committed to "meeting developers where they are" and part of that means letting developers...
Author: Saveen Reddy Date: 11/28/2016
U-SQL Advanced Analytics: Introducing Cognitive scenarios for Text and Imaging
Yesterday we introduced you to U-SQL Advanced Analytics and showed how Python can be used with...
Author: Saveen Reddy Date: 11/22/2016
U-SQL Advanced Analytics: Introducing Python Extensions for U-SQL
Last week at Microsoft's Connect 2016 conference, we announced the General Availability of Azure...
Author: Saveen Reddy Date: 11/22/2016
Apache HBase/Phoenix - Tips , Tricks & Best Practices in HDInsight
We will keep this page updated with HDInsight HBase/ Phoenix related commonly asked questions. You...
Author: AshishThapliyal Date: 11/19/2016
Azure Data Lake Store is now generally available
Today we announced general availability of Azure Data Lake services including Azure Data Lake Store...
Author: Amit R. Kulkarni Date: 11/17/2016
Preview: Azure Data Lake Tools for Visual Studio Code
We are pleased to announce the Public Preview of the Azure Data Lake (ADL) Tools for VSCode. The...
Author: Saveen Reddy Date: 11/17/2016
Executing Spark SQL Queries using dotnet ODBC driver
Introduction HDInsight provides numerous ways of executing Spark applications on your cluster. This...
Author: Bharath Venkatesh Date: 10/26/2016
OozieBot: Automated Oozie Workflow and Coordinator Generation
Introducing OozieBot - a tool to help customers automate Oozie job creation. Learn how to use...
Author: Bharath Venkatesh Date: 10/20/2016
Azure Data Lake U-SQL October 2016 Updates: Deprecations turn into errors, sampling is live, sharing catalog objects across ADLA accounts, outputting headers and more!
We seem to be just cranking out new stuff :). Here are the October 2016 Updates for Azure Data Lake...
Author: MRys Date: 10/17/2016