Data Platform - Tools of the Trade
Introduction
When establishing a Data Platform based on Microsoft Azure, several tools are handy and beneficial. This article will list and explain some of the recommended tools, which will make the daily operation efficient.
Data Platform definition |
---|
A Big Data Platform is a type of IT solution that combines the features and capabilities of several Big Data applications and utilities within a single solution. It is an enterprise class IT platform that enables organizations in developing, deploying, operating and managing a Big Data infrastructure/environment. (Source: Techopedia) In this article, however, the presented tools are not only valid for Big Data and/or enterprise class Data Platforms. In this context, they are valid for collecting, integrating and managing all types of data. |
Tools
This list of tools is based on experience architecting, designing, establishing and operating an enterprise class Data Platform. It is not meant to be all-comprehensive, and each tool will only be briefly introduced. It is not ordered in any particular manner. Data Platforms based on other technologies than Microsoft Azure will not be covered.
Azure Portal
The Azure Portal is the main entry into the Data Platform. This provides an overview of all subscriptions, resource groups and resources in the enterprise. It can also be used to create, update, move, remove, monitor and secure resources. Note that there also is an official app for Android and iOS available.
More details: https://portal.azure.com
PowerShell
Use PowerShell to create and run scripts for automation. When dealing with Big Data solution, manual and repeating tasks do not scale unless scripted. This also allows reuse and consistency across subscriptions and environments. PowerShell can be used to automate a wide range of tasks, including deployment, creation of resources, statistics and user role assignment.
More details: https://msdn.microsoft.com/en-us/powershell/
Storage Explorer
Storage Explorer is an application from Microsoft that allows work with Azure storage data on Windows, macOS and Linux. It can be used to get an overview of storage resources, view/upload/download/copy/paste/rename/delete data, retrieve folder statistics and manage SAS keys.
More details: http://storageexplorer.com/
Azure Cosmos DB Data Migration Tool
The Cosmos Azure DB Data Migration is used to migrate data to Azure Cosmos DB. It is an open source solution that imports data to Azure Cosmos DB from a variety of sources, including MongoDB, SQL Server and JSON/CSV files. The tool is available as a graphical interface tool or as command-line tool.
More details: https://azure.microsoft.com/en-us/updates/documentdb-data-migration-tool/
Visual Studio [Code]
Visual Studio is an integrated development environment from Microsoft. It is used to develop custom software (components). Visual Studio Code is a free open source code editing program from Microsoft, available for Windows, macOS and Linux. Although PowerShell can be used to automate a lot of tasks, it does not compare to the capabilities of software development using Visual Studio. It is also possible to add extensions, allowing additional functionality. Some of the most applicable extensions in this regard include Data Lake Tools, Cloud Explorer and PowerShell Tools. A couple of available code repositories are Team Foundation Server and GitHub.
More details: https://www.visualstudio.com/ | https://code.visualstudio.com/
Service Bus Explorer
The Service Bus Explorer is a tool used to manage and test the entities contained in an Azure Service Bus namespace. There is a lot of functionality related to cloud-based data messaging, including Queues and Topics.
More details: https://code.msdn.microsoft.com/windowsapps/Service-Bus-Explorer-f2abca5a
SQL Management Studio
SQL Server Management Studio (SSMS) is an integrated environment for managing any SQL infrastructure, from SQL Server to SQL Database. SSMS provides tools to configure, monitor and administer instances of SQL from wherever deployed. SSMS provides tools to deploy, monitor, and upgrade the data-tier components, such as databases and data warehouses, and to build queries and scripts.
More details: /en-us/sql/ssms/download-sql-server-management-studio-ssms
AdlCopy
AdlCopy is a command-line tool that allows copying data from Storage Containers or Blobs into Data Lake Store. The source account key is required as a parameter.
More details: https://www.microsoft.com/en-us/download/details.aspx?id=50358
AzCopy
AzCopy is a command-line tool that allows copying data to and from Azure Blob, File, and Table storage using simple commands with optimal performance. It can be used to copy data from one object to another within a storage account, or between storage accounts.
More details: /en-us/azure/storage/storage-use-azcopy
Data Catalog Publishing App
The Azure Data Catalog is used to browse and discover enterprise data assets using a regular web browser. The Data Catalog Publishing App is used to register and publish new data assets, and is available as a separate application. This is an alternative to the web-based option.
More details: https://azure.microsoft.com/en-us/services/data-catalog/
Web Platform Installer
The Web Platform Installer is a tool that makes getting the latest components of the Microsoft Web Platform easy. It also makes it easy to install and run the most popular free web applications for content management and more with the built-in Windows Web Application Gallery.
More details: https://www.microsoft.com/web/downloads/platform.aspx
Gateways
When migrating data from On-Premises sources to Azure, a software Gateway is required. There are multiple versions available, based on usage. Three will be covered here:
- Data Management Gateway - enabling cloud access for On-Premises data sources
- PowerBI Gateway - enabling up-to-date dashboards and reports with On-Premises data sources
- LogicApps Gateway - enabling connectivity between Logic Apps and On-Premises data sources
More details: https://www.microsoft.com/en-us/download/details.aspx?id=39717 | https://powerbi.microsoft.com/en-us/gateway/ | /en-us/azure/logic-apps/logic-apps-gateway-install
These software gateways need to be installed on a gateway server, preferably separate servers. To manage and connect to these servers (and others), using Remote Desktop Connection Manager is highly advised.
Conclusion
Operating and managing a Data Platform in Microsoft Azure becomes a lot easier with the right tools. This article has provided an overview of some very useful tools, and how/where to find more details.
See Also
Another important place to find an extensive amount of Cortana Intelligence Suite related articles is the TechNet Wiki itself. The best entry point is Cortana Intelligence Suite Resources on the TechNet Wiki.