Not
Åtkomst till denna sida kräver auktorisation. Du kan prova att logga in eller byta katalog.
Åtkomst till denna sida kräver auktorisation. Du kan prova att byta katalog.
To end the week I decided to make a minor change to the “Generics based Framework for .Net Hadoop MapReduce Job Submission”.
I have been doing some work on creating a co-occurrence matrix for item recommendations. I was going to map the process to a MapReduce job(s), then came across the issue of how I would output the vector data from the reducer. In the current framework the reducer outputs the key/value data in a string format. This works fine for simple data but for a vector this quickly becomes problematic.
To resolve this I have enabled a parameter called “outputFormat”. The default output will be the usual string format; optionally specified with the parameter value “Text”. Additionally a parameter value of “Binary” is supported:
MSDN.Hadoop.Submission.Console.exe
-input "mobile/data" -output "mobile/querytimes"
-mapper "MSDN.Hadoop.MapReduceFSharp.MobilePhoneQueryMapper, MSDN.Hadoop.MapReduceFSharp"
-reducer "MSDN.Hadoop.MapReduceFSharp.MobilePhoneQueryReducer, MSDN.Hadoop.MapReduceFSharp"
-outputFormat Binary
-file "C:\Projects\Release\MSDN.Hadoop.MapReduceFSharp.dll"
When the output format is specified as binary the reducer value is output as a binary serialized version of the data, represented as a Base64 string. Reading the reduced output one can then easily serialize this object back into a .Net type:
- let Deserialize (value:string) =
- let bytes = Convert.FromBase64String(value);
- use stream = new MemoryStream(bytes)
- let formatter = new BinaryFormatter()
- formatter.Deserialize(stream)
Hopefully one will find this a lot simpler than performing string manipulations.