استخدم MapReduce في Apache Hadoop على HDInsight

مقالة
01/10/2024

تعلم كيفية تشغيل وظائف MapReduce على نظم مجموعات HDInsight.

مثال على البيانات

يوفر HDInsight نموذج مجموعات البيانات المتنوعة، التي تُخزن في دليل /example/dataو/HdiSamples. هذه الدلائل في التخزين الافتراضي لنظام المجموعة الخاصة بك. في هذا المستند، نستخدم ملف /example/data/gutenberg/davinci.txt. يحتوي هذا الملف على دفاتر الملاحظات الخاصة ب Leonardo da Vinci.

نموذج MapReduce

كمثال MapReduce يتم تضمين تطبيق عدد الكلمات مع نظام مجموعة HDInsight الخاص بك. يقع هذا المثال في /example/jars/hadoop-mapreduce-examples.jar التخزين الافتراضي لنظام المجموعة الخاصة بك.

تعليمة Java البرمجية التالية هي مصدر تطبيق MapReduce المضمنة في hadoop-mapreduce-examples.jar الملف:

package org.apache.hadoop.examples;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

    public static class TokenizerMapper
        extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
        }
    }
    }

    public static class IntSumReducer
        extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
                        Context context
                        ) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
        sum += val.get();
        }
        result.set(sum);
        context.write(key, result);
    }
    }

    public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
        System.err.println("Usage: wordcount <in> <out>");
        System.exit(2);
    }
    Job job = new Job(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

للحصول على تعليمات لكتابة تطبيقات MapReduce الخاصة بك، راجع تطوير تطبيقات Java MapReduce الخاصة بـ HDInsight.

تشغيل MapReduce

يستطيع HDInsight تشغيل وظائف الخلية بأساليب مختلفة. استخدم الجدول التالي لتحديد الطريقة المناسبة لك ثم اتبع الارتباط للإرشادات التفصيلية.

استخدام هذا....	...لكي تفعل هذا	...من هذا نظام تشغيل العميل
سه	استخدام أمر Hadoop من خلال SSH	Linux أو Unix `MacOS X`أو Windows
الحلقة	تعيين الوظيفة عن بعد باستخدام REST	Linux أو Unix `MacOS X`أو Windows
Windows PowerShell	تعيين الوظيفة عن بعد باستخدام Windows PowerShell	Windows

الخطوات التالية

لمعرفة المزيد عن التعامل مع البيانات الواردة في HDInsight، يُرجى الرجوع إلى المستندات التالية:

مشاركة عبر