Getting Up and Running with Piggybank on HDInsight

It’s been a fun couple of weeks launching HDInsight, and I’m going to be getting back into doing some more technical blogging.  There are a few easy topics off the bat that we’ve heard requested from customers.  The first one involves Piggybank, which is a user contributed collection of useful Pig user defined functions (UDF’s).

This assumes your machine is set up with:

  • HDInsight (grab from WebPI here)
  • Java build tools (Ant and Ivy on your command path)

Next, let’s build Piggybank by grabbing the Pig source code and checking out the 0.9 branch

 

At this point, you should now have a pig directory, move to that and type ant in order to build.

Next, navigate to .\contrib\piggybank\java, and again, type ant in order to build.  This will produce piggybank.jar.

Next, open your HDInsight console window and type pig.  This brings up Grunt, the interactive pig shell.

At this point, you can now use the following in your script:

 

REGISTER C:\Your\Path\To\piggybank.jar ;

foo = FOREACH entry GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(item_name);

 

At this point, you can now take advantage of all of the functions in piggybank, and if you’re interested in contributing your own, details are here.