Several submit operation modes of MR program in Hadoop

Several submit operation modes of MR program in Hadoop

Local model operation

1: Run the main method directly in the eclipse of windows, and the job will be submitted to the local executor localjobrunner for execution

      ----Input and output data can be placed in the local path (c:/wc/srcdata/)

      ----Input and output data can also be placed in hdfs (hdfs://centosReall-131:9000/wc/srcdata)

2: Run the main method directly in eclipse of linux, but do not add the configuration file related to yarn, and it will be submitted to localjobrunner

      ----Input and output data can be placed in the local path (/home/hadoop/wc/srcdata/)

      ----Input and output data can also be placed in hdfs (hdfs:/centosReall-131:9000/wc/srcdata)  

Cluster mode operation

1: Type the project into a jar package, upload it to the server, and then use the hadoop command to submit hadoop jar wc.jar cn.intsmaze.hadoop.mr.wordcount.WCRunner

After the program is written, it must be packaged into a jar package and placed on the hadoop cluster for operation. The jar package packaged here is named wc.jar.

First upload the file to the linux directory, and then use the instructions in the directory to distribute the jar to the hadoop cluster and specify which program to run.

hadoop jar wc.jar cn.intsmaze.hadoop.mr.WCRunner (specify the full path to run the java class) at this time the program is executed.

2: Run the main method directly in eclipse of linux, or submit it to the cluster to run, but the following measures must be taken:

      ----Add mapred-site.xml and yarn-site.xml to the project src directory (see the build log in the hdfs folder for the modification of these two files when building hdfs)

      ----Type the project into a jar package (wc.jar), and add a conf configuration parameter in the main method 

Configuration conf = new Configuration();
conf.set("mapreduce.job.jar","wc.jar");

The location of the JAR package should be in the project that runs the program.

3: Run the main method directly in the eclipse of windows, or submit it to the cluster to run, but because the platform is not compatible, you need to do a lot of setting changes (it is troublesome to ignore)

----Store a copy of hadoop installation package (decompressed) in windows

----Replace the lib and bin directories with files recompiled according to your windows version

----Then configure the system environment variables HADOOP_HOME and PATH

----Modify the source code of the YarnRunner class