Udostępnij za pośrednictwem


Hadoop Streaming in F# and MapReduce (summary)

With all my recent posts around Hadoop Streaming I thought it would be useful to summarize them into a single post. The main objective of these posts was to put together a codebase to enable F# developers to write Map/Reduce libraries through a simple API.

The full code posting can be found here: https://code.msdn.microsoft.com/Hadoop-Streaming-and-F-f2e76850

The idea was to provide reusable code such that one only needed to be concerned with implementing the Map/Reduce code with the following function prototypes:

For Text Streaming:

Map : string > (string * obj) option

Reduce : string -> seq<string> > obj option

For Binary Streaming:

Map : WordprocessingDocument -> seq<string * obj>)
Map : PdfReader -> seq<string * obj>)

Reduce: string -> seq<string> -> obj option

For XML Streaming:

Map : XElement-> seq<(string * string) * obj>)

Reduce : string * string -> seq<string> -> obj option

So here is the full posting summary:

Hadoop Streaming and F# MapReduce

Using Hadoop on Azure JS Console for Data Visualizations

MapReduce Tester: A Quick Word

Hadoop Binary Streaming and F# MapReduce

Hadoop Binary Streaming and PDF File Inclusion

Hadoop Streaming and Reporting

Hadoop Streaming and Windows Azure Blob Storage

Hadoop XML Streaming and F# MapReduce

Look out for more Hadoop posts in the coming months.