Scenario: Joins in Apache Hive leads to an OutOfMemory error in Azure HDInsight

This article describes troubleshooting steps and possible resolutions for issues when using Interactive Query components in Azure HDInsight clusters.

Issue

The default behavior for Apache Hive joins is to load the entire contents of a table into memory so that a join can be performed without having to perform a Map/Reduce step. If the Hive table is too large to fit into memory, the query can fail.

Cause

When running joins in hive of sufficient size, the following error is encountered:

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded error.

Resolution

Prevent Hive from loading tables into memory on joins (instead performing a Map/Reduce step) by setting the following Hive configuration value:

hive.auto.convert.join=false

Next steps

If setting this value didn't resolve your issue, visit one of the following...

  • Get answers from Azure experts through Azure Community Support.

  • Connect with @AzureSupport - the official Microsoft Azure account for improving customer experience by connecting the Azure community to the right resources: answers, support, and experts.

  • If you need more help, you can submit a support request from the Azure portal. Select Support from the menu bar or open the Help + support hub. For more detailed information, please review How to create an Azure support request. Access to Subscription Management and billing support is included with your Microsoft Azure subscription, and Technical Support is provided through one of the Azure Support Plans.