Sunday, July 22, 2012

Embedding Pig Scripts in Java Applications

You are familiar with running pig scripts via the command line. Now what if you intend to run Apache Pig as part of your Java applications?

Pig Modes


There are two modes as shown in http://pig.apache.org/docs/r0.7.0/setup.html#Run+Modes


PigServer

This is the main class for embedding Apache Pig as part of your java applications.


import org.apache.pig.PigServer;

=================
 PigServer pigServer = null;
String mode = "local";
 String pigScriptName = null;

Map<String,String> params = null;

List<String> paramFiles = null;

        try {
            pigServer = new PigServer(mode);
            pigServer.setBatchOn();
            pigServer.debugOn();
            InputStream is = getClass().getClassLoader().getResourceAsStream(pigScriptName);
            if(params != null){
                pigServer.registerScript(is, params);
            } else if(paramFiles != null){
                pigServer.registerScript(is, paramFiles);
            } else {
                pigServer.registerScript(is);
            }
            pigServer.executeBatch();
        } catch (Exception e) {
            throw new RuntimeException(e);
        } finally {
            if(pigServer != null){
                pigServer.shutdown();
            }
        }
==========================
Note: the variable mode can be "local" or "mapreduce".

PigServer can take two additional parameters while registering your pig script.
  • Params: this is a key/value map passed that can be referenced in your pigscript as $key.
  • ParamFiles: takes in filenames that contain the parameters.
You can register the script with the PigServer without providing any params.

Do not forget to bookmark this blog. :)

All the best!

Reference: http://everythingbigdata.blogspot.com/2012/07/apache-pig-tips.html

5 comments:

  1. Hey Anil,

    I tried running a basic Apache PIG script using the above but I'm seeing the exception (https://gist.github.com/4129687) caused by the following:

    [java] Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching map reduce job: java.lang.UnsupportedOperationException: getJobTrackerAddrs is not supported
    [java] at org.apache.hadoop.fs.FileSystem.getJobTrackerAddrs(FileSystem.java:1796)

    Any help? Thanks in advance

    Suresh

    ReplyDelete
    Replies
    1. I'm using a Hadoop 1.1.0 with PIG 0.10.0

      Delete
    2. Suresh - not run this in a while. It is definitely an incompatibility between the Hadoop version and the Pig version.

      Options:
      a) Try with an earlier version of either Hadoop or Pig. Check the official supported Hadoop version for the Pig version you are trying.
      b) Check your classpath settings. Maybe one of the incompatible hadoop libraries are sneaking in.

      Delete
  2. HI I tried running the same, but got an error saying -"cannot find hadoop configurations in classpath (neither hadoop-site.xml nor core-site.xml".

    I tried googling and It said add PIG_CLASSPATH to environment variable, But am still unable to resolve the same.
    Can you please help?

    ReplyDelete
    Replies
    1. You can just run this with Pig and Hadoop jars. That is how I had done it in an IDE.

      Delete