Everything Big Data! Hadoop,HBase,NoSQL,MongoDB,HDFS,MapReduce etc.: Embedding Pig Scripts in Java Applications

Sunday, July 22, 2012

Embedding Pig Scripts in Java Applications

You are familiar with running pig scripts via the command line. Now what if you intend to run Apache Pig as part of your Java applications?

Pig Modes

There are two modes as shown in http://pig.apache.org/docs/r0.7.0/setup.html#Run+Modes

PigServer

This is the main class for embedding Apache Pig as part of your java applications.

import org.apache.pig.PigServer;

=================
PigServer pigServer = null;
String mode = "local";
String pigScriptName = null;

Map<String,String> params = null;

List<String> paramFiles = null;

        try {
            pigServer = new PigServer(mode);
            pigServer.setBatchOn();
            pigServer.debugOn();
            InputStream is = getClass().getClassLoader().getResourceAsStream(pigScriptName);
            if(params != null){
                pigServer.registerScript(is, params);
            } else if(paramFiles != null){
                pigServer.registerScript(is, paramFiles);
            } else {
                pigServer.registerScript(is);
            }
            pigServer.executeBatch();
        } catch (Exception e) {
            throw new RuntimeException(e);
        } finally {
            if(pigServer != null){
                pigServer.shutdown();
            }
        }
==========================
Note: the variable mode can be "local" or "mapreduce".

PigServer can take two additional parameters while registering your pig script.

Params: this is a key/value map passed that can be referenced in your pigscript as $key.
ParamFiles: takes in filenames that contain the parameters.

You can register the script with the PigServer without providing any params.

Do not forget to bookmark this blog. :)

All the best!

Reference: http://everythingbigdata.blogspot.com/2012/07/apache-pig-tips.html

5 comments:

Suresh SaggarNovember 23, 2012 at 12:57 AM
Hey Anil,

I tried running a basic Apache PIG script using the above but I'm seeing the exception (https://gist.github.com/4129687) caused by the following:

[java] Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching map reduce job: java.lang.UnsupportedOperationException: getJobTrackerAddrs is not supported
[java] at org.apache.hadoop.fs.FileSystem.getJobTrackerAddrs(FileSystem.java:1796)

Any help? Thanks in advance

Suresh
ReplyDelete
Replies
Deepali DhingraAugust 29, 2013 at 11:09 PM
HI I tried running the same, but got an error saying -"cannot find hadoop configurations in classpath (neither hadoop-site.xml nor core-site.xml".

I tried googling and It said add PIG_CLASSPATH to environment variable, But am still unable to resolve the same.
Can you please help?
ReplyDelete
Replies

Add comment