Work in the Apache Hadoop ecosystem on HDInsight from a Windows PC

Learn about development and management options on the Windows PC for working in the Apache Hadoop ecosystem on HDInsight.

HDInsight is based on Apache Hadoop and Hadoop components, open-source technologies developed on Linux. HDInsight version 3.4 and higher uses the Ubuntu Linux distribution as the underlying OS for the cluster. However, you can work with HDInsight from a Windows client or Windows development environment.

Use PowerShell for deployment and management tasks

Azure PowerShell is a scripting environment that you can use to control and automate deployment and management tasks in HDInsight from Windows.

Examples of tasks you can do with PowerShell:

Follow steps to install and configure Azure PowerShell to get the latest version.

Utilities you can run in a browser

The following utilities have a web UI that runs in a browser:

Before you go to the following examples, install and try Data Lake Tools for Visual Studio.

Visual Studio and the .NET SDK

You can use Visual Studio with the .NET SDK to manage clusters and develop big data applications. You can use other IDEs for the following tasks, but examples are shown in Visual Studio.

Examples of tasks you can do with the .NET SDK in Visual Studio:

Intellij IDEA and Eclipse IDE for Spark clusters

Both Intellij IDEA and the Eclipse IDE can be used to:

  • Develop and submit a Scala Spark application on an HDInsight Spark cluster.
  • Access Spark cluster resources.
  • Develop and run a Scala Spark application locally.

These articles show how:

Notebooks on Spark for data scientists

Apache Spark clusters in HDInsight include Apache Zeppelin notebooks and kernels that can be used with Jupyter Notebooks.

Run Linux-based tools and technologies on Windows

If you come across a situation where you must use a tool or technology that is only available on Linux, consider the following options:

  • Bash on Ubuntu on Windows 10 provides a Linux subsystem on Windows. Bash allows you to directly run Linux utilities without having to maintain a dedicated Linux installation. See Windows Subsystem for Linux Installation Guide for Windows 10 for installation steps. Other Unix shells work as well.
  • Docker for Windows provides access to many Linux-based tools, and can be run directly from Windows. For example, you can use Docker to run the Beeline client for Hive directly from Windows. You can also use Docker to run a local Jupyter Notebook and remotely connect to Spark on HDInsight. Get started with Docker for Windows
  • MobaXTerm allows you to graphically browse the cluster file system over an SSH connection.

Cross-platform tools

The Azure command-line interface (CLI) is Microsoft's cross-platform command-line experience for managing Azure resources. For more information, see Azure Command-Line Interface (CLI).

Next steps

If you're new to work in Linux-based clusters, see the following articles: