Big Data Support
This is the team blog for the Big Data Analytics & NoSQL Support team at Microsoft. We support HDInsight which is Hadoop running on Azure in the cloud, as well as other big data analytics features.
Rerunning many slices and activities in Azure Data Factory
Today someone asked me how to run all the data slices in their data factory on-demand in an ad-hoc...
Date: 08/31/2016
Capture Microsoft Azure Stream Analytics logs
Microsoft Azure Stream Analytics makes building real time solution very easy. Developers can build...
Date: 08/24/2016
HDFS gets full in Azure HDInsight with many Hive temporary files
Sometimes when Hive is using temporary files, and a VM is restarted in an HDInsight cluster in...
Date: 08/15/2016
How to Find and Kill a running Yarn Application Master in HDInsight with and without SSH access
Today we faced a challenge in HDInsight not knowing the SSH user password to terminal into the...
Date: 06/11/2016
How to Lock a Resource Group to prevent accidental deletion of resources like HDInsight
Did you know it is possible to prevent accidental deletion of resources in Azure? This could apply...
Date: 05/16/2016
HDInsight Name Node can stay in Safe mode after a Scale Down
This week we worked on an HDInsight cluster where the Name Node has gone into Safe mode and didn't...
Date: 03/16/2016
HDInsight Hive Metastore fails when the database name has dashes or hyphens
Working in Azure HDInsight support today, we see a failure when trying to run a Hive query on a...
Date: 02/24/2016
How to call a Azure Machine Learning Web Service from NodeJS
Azure machine learning allows data scientists and developers to embed predictive analytics into...
Date: 02/18/2016
Encoding 101 - Exporting from SQL Server into flat files, to create a Hive external table
Today in Microsoft Big Data Support we faced the issue of how to correctly move Unicode data from...
Date: 02/05/2016
Encoding the Hive query file in Azure HDInsight
Today at Microsoft we were using Azure Data Factory to run Hive Activities in Azure HDInsight on a...
Date: 02/05/2016
Incremental data load from Azure Table Storage to Azure SQL using Azure Data Factory
Azure Data Factory is a cloud based data integration service. The service not only helps to move...
Date: 01/23/2016
How to allow Spark to access Microsoft SQL Server
Today we will look at configuring Spark to access Microsoft SQL Server through JDBC. On HDInsight...
Date: 10/22/2015
Using Azure SDK for Python
Python is a great scripting tool with a large user base. In a recent support case I needed a way to...
Date: 10/02/2015
A KMeans example for Spark MLlib on HDInsight
Today we will take a look at Sparks's module for MLlib or its built-in machine learning library...
Date: 09/24/2015
Dealing with RequestRateTooLarge errors in Azure DocumentDB and testing performance
In Azure DocumentDB support, one of the most common errors we have seen as reported by our customers...
Date: 09/02/2015
How to configure Hortonworks HDP to access Azure Windows Storage
Recently I was asked how to configure a Hortonworks HDP 2.3 cluster to access Azure Windows Storage....
Date: 09/01/2015
Troubleshooting Oozie or other Hadoop errors with DEBUG logging
In troubleshooting Hadoop issues, we often need to review the logging of a specific Hadoop...
Date: 08/21/2015
Some things to consider for your Spark on HDInsight workload
When it comes time to provision your Spark cluster on HDInsight we all want our workloads to execute...
Date: 08/19/2015
How to Access HDInsight Linux Web UI's using SSH Dynamic Tunneling
Scenario One of the most important feature of Azure HDInsight Linux (currently on preview), is the...
Date: 08/12/2015
Why is my spark application running out of disk space?
In your zeppelin notebook you have scala code that loads parquet data from two folders that is...
Date: 08/12/2015
Using cross/outer apply in Azure Stream Analytics
Recently I got involved in working with a problem where JSON data events contain an array of values....
Date: 08/05/2015
Azure Data Factory JSON Changes in July 2015
Azure Data Factory factories are designed with a series of fairly simple JSON documents and uploaded...
Date: 07/21/2015
Spark on Azure HDInsight is available
Spark on Azure HDInsight (public preview) is now available! The following components are included as...
Date: 07/14/2015
How to access Hive using JDBC on HDInsight
While following up on a customer question recently on this topic, I realized that we have seen the...
Date: 06/09/2015
How to install Splunk on HDINSIGHT with a custom action script
Recently I worked with a customer that wanted to use Splunk Enterprise and Splunk Forwarder to...
Date: 06/01/2015
Why are the Hadoop services disabled on my HDInsight cluster
I came across this question while working with a few customers recently and thought I would share a...
Date: 05/31/2015
Understanding HDInsight Custom Node VM Sizes
// With the 02/18/2015 update to HDInsight and Azure Powershell 0.8.14 we introduced a lot more...
Date: 05/11/2015
Azure PowerShell 0.8.14 Released, fixes problems with pipelining HDInsight configuration cmdlets
We recently pushed out the 0.8.14 release of Azure PowerShell. This release includes some updates to...
Date: 02/16/2015
Problems When Using a Shared Default Storage Container with Multiple HDInsight Clusters
We have seen several cases come in to Microsoft Support that ended up being caused by having...
Date: 02/12/2015
Some Commonly Used Yarn Memory Settings
We were recently working on an out of memory issue that was occurring with certain workloads on...
Date: 11/11/2014
How to use parameter substitution with Pig Latin and PowerShell
When running Pig in a production environment, you'll likely have one or more Pig Latin scripts that...
Date: 08/12/2014
HDInsight: - Creating, Deploying and Executing Pig UDF
During my developer experience, I always look for how customization (write my own processing) can be...
Date: 07/07/2014
How to use a Custom JSON Serde with Microsoft Azure HDInsight
I had a recent need to parse JSON files using Hive. There were a couple of options that I could use....
Date: 06/18/2014
Some Frequently Asked Questions on Microsoft Azure HDInsight
We have seen some common questions on HDInsight when interacting with customers and partners. On...
Date: 05/22/2014
HDInsight News - New Videos to watch - HDInsight Provisioning demonstrations
Check out these two recent videos demos regarding HDInsight provisioning These videos complement the...
Date: 05/09/2014
HDInsight: - backup and restore hive table
Introduction My name is Sudhir Rawat and I work on the Microsoft HDInsight support team. In this...
Date: 05/01/2014
Sliding Window Data Partitioning on Microsoft Azure HDInsight
HCatalog is a table and storage management layer for Hadoop that enables users with different data...
Date: 04/23/2014
Querying HDInsight Job Status with WebHCat via Native PowerShell or Node.js
// One of the great things about HDInsight is that under the covers, it has the same capabilities as...
Date: 04/22/2014
Customizing HDInsight Cluster provisioning
In my last blog, I discussed how we can specify Hadoop configurations for a job on an HDInsight...
Date: 04/15/2014
Using Apache Flume with HDInsight
Gregory Suarez – 03/18/2014 (This blog posting assumes some basic knowledge of Apache Flume)...
Date: 03/18/2014