You are familiar with running pig scripts via the command line. Now what if you intend to run Apache Pig as part of your Java applications?
There are two modes as shown in http://pig.apache.org/docs/r0.7.0/setup.html#Run+Modes
import org.apache.pig.PigServer;
=================
PigServer pigServer = null;
String mode = "local";
String pigScriptName = null;
Map<String,String> params = null;
List<String> paramFiles = null;
try {
pigServer = new PigServer(mode);
pigServer.setBatchOn();
pigServer.debugOn();
InputStream is = getClass().getClassLoader().getResourceAsStream(pigScriptName);
if(params != null){
pigServer.registerScript(is, params);
} else if(paramFiles != null){
pigServer.registerScript(is, paramFiles);
} else {
pigServer.registerScript(is);
}
pigServer.executeBatch();
} catch (Exception e) {
throw new RuntimeException(e);
} finally {
if(pigServer != null){
pigServer.shutdown();
}
}
==========================
Note: the variable mode can be "local" or "mapreduce".
PigServer can take two additional parameters while registering your pig script.
Do not forget to bookmark this blog. :)
All the best!
Reference: http://everythingbigdata.blogspot.com/2012/07/apache-pig-tips.html
Pig Modes
There are two modes as shown in http://pig.apache.org/docs/r0.7.0/setup.html#Run+Modes
PigServer
This is the main class for embedding Apache Pig as part of your java applications.import org.apache.pig.PigServer;
=================
PigServer pigServer = null;
String mode = "local";
String pigScriptName = null;
Map<String,String> params = null;
List<String> paramFiles = null;
try {
pigServer = new PigServer(mode);
pigServer.setBatchOn();
pigServer.debugOn();
InputStream is = getClass().getClassLoader().getResourceAsStream(pigScriptName);
if(params != null){
pigServer.registerScript(is, params);
} else if(paramFiles != null){
pigServer.registerScript(is, paramFiles);
} else {
pigServer.registerScript(is);
}
pigServer.executeBatch();
} catch (Exception e) {
throw new RuntimeException(e);
} finally {
if(pigServer != null){
pigServer.shutdown();
}
}
==========================
Note: the variable mode can be "local" or "mapreduce".
PigServer can take two additional parameters while registering your pig script.
- Params: this is a key/value map passed that can be referenced in your pigscript as $key.
- ParamFiles: takes in filenames that contain the parameters.
Do not forget to bookmark this blog. :)
All the best!
Reference: http://everythingbigdata.blogspot.com/2012/07/apache-pig-tips.html
Hey Anil,
ReplyDeleteI tried running a basic Apache PIG script using the above but I'm seeing the exception (https://gist.github.com/4129687) caused by the following:
[java] Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching map reduce job: java.lang.UnsupportedOperationException: getJobTrackerAddrs is not supported
[java] at org.apache.hadoop.fs.FileSystem.getJobTrackerAddrs(FileSystem.java:1796)
Any help? Thanks in advance
Suresh
I'm using a Hadoop 1.1.0 with PIG 0.10.0
DeleteSuresh - not run this in a while. It is definitely an incompatibility between the Hadoop version and the Pig version.
DeleteOptions:
a) Try with an earlier version of either Hadoop or Pig. Check the official supported Hadoop version for the Pig version you are trying.
b) Check your classpath settings. Maybe one of the incompatible hadoop libraries are sneaking in.
HI I tried running the same, but got an error saying -"cannot find hadoop configurations in classpath (neither hadoop-site.xml nor core-site.xml".
ReplyDeleteI tried googling and It said add PIG_CLASSPATH to environment variable, But am still unable to resolve the same.
Can you please help?
You can just run this with Pig and Hadoop jars. That is how I had done it in an IDE.
Delete