Building custom clients

patterns & practices Developer Center

From: Developing big data solutions on Microsoft Azure HDInsight

You can use existing applications, utilities, and third party tools to manage clusters and storage accounts, and to upload data to your cluster storage. Many of these are listed in Appendix A - Tools and technologies reference. However, you may find that building your own custom tools and utilities, even if only scripts that you execute on demand or as part of a scheduled process, is a useful approach that can minimize operator errors and provide a standardized process.

Creating or adopting automated mechanisms can also help to make a solution more secure because you can assign specific permissions to each operation or tool, control access to data by allowing it to be read only through a specific tool, and hide sensitive configuration settings (such as keys and credentials) from users.

Automating data upload

You can upload data to the storage accounts that HDInsight uses before or after you have initialized a cluster. While there are many tools available that you can use to upload data files to Azure storage, it is common in many big data processing scenarios to implement custom data loading code in a client application or utility. In some cases this code may take the form of a script or simple command line utility that simplifies the upload of data for a repeatable data processing task. In others, the code may be used to integrate big data processing into a business application or solution. Techniques for automating data upload are described in Custom data upload clients.

Automating cluster management

Before you can use HDInsight to process data you have uploaded, you must provision an HDInsight cluster. In environments where big data processing is a constant activity, you might choose to do this once and leave the cluster running. However, if data is processed only periodically you can reduce operational costs by provisioning the cluster just when you need it, and deleting it when each batch of data processing tasks is complete. Techniques for automating cluster management are described in Custom cluster management clients.

Next Topic | Previous Topic | Home | Community