Friday 5 July 2013

HBase MapReduce Examples - Training

          HBase MapReduce Examples

        HBase MapReduce Read Example:

The following is an example of using HBase as a MapReduce source in read-only manner. Specifically, there is a Mapper instance but no Reducer, and nothing is being emitted from the Mapper. There job would be defined as follows...
Configuration config = HBaseConfiguration.create();
Job job = new Job(config, "ExampleRead");
job.setJarByClass(MyReadJob.class);     // class that contains mapper

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which 
                                              will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs
...

TableMapReduceUtil.initTableMapperJob(
  tableName,        // input HBase table name
  scan,             // Scan instance to control CF and attribute selection
  MyMapper.class,   // mapper
  null,             // mapper output key
  null,             // mapper output value
  job);
job.setOutputFormatClass(NullOutputFormat.class);   // because we aren't
                                                 emitting anything from mapper

boolean b = job.waitForCompletion(true);
if (!b) {
  throw new IOException("error with job!");
}

 

public static class MyMapper extends TableMapper<Text, Text> {

  public void map(ImmutableBytesWritable row, Result value, Context context) 
                                 throws InterruptedException, IOException {
    // process data for the row from the Result instance.
   }
}
    

       HBase MapReduce Read/Write Example

The following is an example of using HBase both as a source and as a sink with MapReduce. This example will simply copy data from one table to another.
Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleReadWrite");
job.setJarByClass(MyReadWriteJob.class);    // class that contains mapper

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad
                                              for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs

TableMapReduceUtil.initTableMapperJob(
 sourceTable,      // input table
 scan,           // Scan instance to control CF and attribute selection
 MyMapper.class,   // mapper class
 null,           // mapper output key
 null,           // mapper output value
 job);
TableMapReduceUtil.initTableReducerJob(
 targetTable,      // output table
 null,             // reducer class
 job);
job.setNumReduceTasks(0);

boolean b = job.waitForCompletion(true);
if (!b) {
    throw new IOException("error with job!");
}
    
An explanation is required of what TableMapReduceUtil is doing, especially with the reducer. TableOutputFormat is being used as the outputFormat class, and several parameters are being set on the config (e.g., TableOutputFormat.OUTPUT_TABLE), as well as setting the reducer output key to ImmutableBytesWritable and reducer value to Writable. These could be set by the programmer on the job and conf, but TableMapReduceUtil tries to make things easier.
The following is the example mapper, which will create a Put and matching the input Result and emit it. Note: this is what the CopyTable utility does.
public static class MyMapper extends TableMapper<ImmutableBytesWritable, Put>  {

 public void map(ImmutableBytesWritable row, Result value, Context context) throws
IOException, InterruptedException {
  // this example is just copying the data from the source table...
     context.write(row, resultToPut(row,value));
    }

   private static Put resultToPut(ImmutableBytesWritable key, Result result)
throws IOException {
    Put put = new Put(key.get());
   for (KeyValue kv : result.raw()) {
   put.add(kv);
  }
  return put;
    }
}
    
There isn't actually a reducer step, so TableOutputFormat takes care of sending the Put to the target table.

Get Hands-on Training @ BigDataTraining.IN
BigDataTraining.IN - India's Leading BigData Consulting & Training Provider, Request a Quote!

Hadoop & Big Data Training | Development | Consulting | Projects

http://www.bigdatatraining.in/hadoop-training-chennai/

http://www.hadooptrainingchennai.in/hadoop-training-in-chennai/

http://www.bigdatatraining.in/

email : info@bigdatatraining.in



 Call:
+91 9789968765
044 - 42645495

Visit Us:
#67, 2nd Floor, Gandhi Nagar 1st Main Road, Adyar, Chennai - 20
[Opp to Adyar Lifestyle Super Market]

 

 

No comments:

Post a Comment