HBase MapReduce Examples
HBase MapReduce Read Example:
The following is an example of using HBase as a MapReduce source in read-only manner. Specifically, there is a Mapper instance but no Reducer, and nothing is being emitted from the Mapper. There job would be defined as follows...
Configuration config = HBaseConfiguration.create(); Job job = new Job(config, "ExampleRead"); job.setJarByClass(MyReadJob.class); // class that contains mapper Scan scan = new Scan(); scan.setCaching(500); // 1 is the default in Scan, which
will be bad for MapReduce jobs scan.setCacheBlocks(false); // don't set to true for MR jobs // set other scan attrs ... TableMapReduceUtil.initTableMapperJob( tableName, // input HBase table name scan, // Scan instance to control CF and attribute selection MyMapper.class, // mapper null, // mapper output key null, // mapper output value job); job.setOutputFormatClass(NullOutputFormat.class); // because we aren't
emitting anything from mapper boolean b = job.waitForCompletion(true); if (!b) { throw new IOException("error with job!"); }
public static class MyMapper extends TableMapper<Text, Text> { public void map(ImmutableBytesWritable row, Result value, Context context)
throws InterruptedException, IOException { // process data for the row from the Result instance. } }
HBase MapReduce Read/Write Example
The following is an example of using HBase both as a source and as a sink with MapReduce. This example will simply copy data from one table to another.
Configuration config = HBaseConfiguration.create(); Job job = new Job(config,"ExampleReadWrite"); job.setJarByClass(MyReadWriteJob.class); // class that contains mapper Scan scan = new Scan(); scan.setCaching(500); // 1 is the default in Scan, which will be bad
for MapReduce jobs scan.setCacheBlocks(false); // don't set to true for MR jobs // set other scan attrs TableMapReduceUtil.initTableMapperJob( sourceTable, // input table scan, // Scan instance to control CF and attribute selection MyMapper.class, // mapper class null, // mapper output key null, // mapper output value job); TableMapReduceUtil.initTableReducerJob( targetTable, // output table null, // reducer class job); job.setNumReduceTasks(0); boolean b = job.waitForCompletion(true); if (!b) { throw new IOException("error with job!"); }
An explanation is required of what
TableMapReduceUtil
is doing, especially with the reducer. TableOutputFormat is being used as the outputFormat class, and several parameters are being set on the config (e.g., TableOutputFormat.OUTPUT_TABLE), as well as setting the reducer output key to ImmutableBytesWritable
and reducer value to Writable
. These could be set by the programmer on the job and conf, but TableMapReduceUtil
tries to make things easier.
The following is the example mapper, which will create a
Put
and matching the input Result
and emit it. Note: this is what the CopyTable utility does.public static class MyMapper extends TableMapper<ImmutableBytesWritable, Put> { public void map(ImmutableBytesWritable row, Result value, Context context) throws
IOException, InterruptedException { // this example is just copying the data from the source table... context.write(row, resultToPut(row,value)); } private static Put resultToPut(ImmutableBytesWritable key, Result result)
throws IOException { Put put = new Put(key.get()); for (KeyValue kv : result.raw()) { put.add(kv); } return put; } }
There isn't actually a reducer step, so
TableOutputFormat
takes care of sending the Put
to the target table.
Get Hands-on Training @ BigDataTraining.IN
Hadoop & Big Data Training | Development | Consulting | Projects
http://www.bigdatatraining.in/hadoop-training-chennai/
http://www.hadooptrainingchennai.in/hadoop-training-in-chennai/
http://www.bigdatatraining.in/
email : info@bigdatatraining.in
Call:
+91 9789968765
044 - 42645495
Visit Us:
#67, 2nd Floor, Gandhi Nagar 1st Main Road, Adyar, Chennai - 20
[Opp to Adyar Lifestyle Super Market]
+91 9789968765
044 - 42645495
Visit Us:
#67, 2nd Floor, Gandhi Nagar 1st Main Road, Adyar, Chennai - 20
[Opp to Adyar Lifestyle Super Market]
No comments:
Post a Comment