shanyu
How does Spark determine partitions for an RDD?
The most fundamental data structure in Spark is called RDD (Resilient Distributed Dataset). An RDD...
Date: 05/08/2018
Understanding and Using HDInsight Spark Streaming
There are plenty of blogs and materials out there talking about Spark Streaming. Most of them focus...
Date: 09/18/2015
Performance Tuning for HDInsight Storm and Microsoft Azure EventHubs
Apache Storm is a popular real time data processing framework. Microsoft Azure HDInsight provides a...
Date: 05/14/2015
HDInsight Storm Topology Submission Via VNet
- Introduction To submit a Storm topology to an HDInsight cluster, a user can RDP to the headnode...
Date: 10/28/2014
Hadoop Yarn memory settings in HDInsight
(Edit: thanks Mostafa for the valuable feedback, I updated this post with explanation about the...
Date: 07/31/2014