I was going through Alex P's blog posts and one post that attracted my attention was related to Apache Pig. I have been thinking of playing with Pig Latin Scripts to simulate Map Reduce functionality.
I tried to use pig to run Alex's pig script. He just gives the input values and the output from Pig along with the Pig script. There is no information on how to use Pig. That is fine. He just wants reader to go through the Pig manual. :)
Here is what I tried out:
1) Downloaded Apache Pig 0.9.2 (that was the latest version).
2) The script from Alex uses PiggyBank which is in Pig Contrib directory. Looks like I will have to build Pig.
=====================
pig_directory $> ant
...
[javacc] Java Compiler Compiler Version 4.2 (Parser Generator)
[javacc] (type "javacc" with no arguments for help)
...
[javacc] File "SimpleCharStream.java" is being rebuilt.
[javacc] Parser generated successfully.
prepare:
[mkdir] Created dir: xxx/pig-0.9.2/src-gen/org/apache/pig/parser
genLexer:
genParser:
genTreeParser:
gen:
compile:
[echo] *** Building Main Sources ***
[echo] *** To compile with all warnings enabled, supply -Dall.warnings=1 on command line ***
[echo] *** If all.warnings property is supplied, compile-sources-all-warnings target will be executed ***
[echo] *** Else, compile-sources (which only warns about deprecations) target will be executed ***
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
compile-sources:
[javac] xxx/pig-0.9.2/build.xml:429: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 667 source files to /home/anil/hadoop/pig/pig-0.9.2/build/classes
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[copy] Copying 1 file to xxx/pig-0.9.2/build/classes/org/apache/pig/tools/grunt
[copy] Copying 1 file to xxx/pig-0.9.2/build/classes/org/apache/pig/tools/grunt
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
compile-sources-all-warnings:
jar:
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
jarWithSvn:
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
ivy-download:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
[get] To: /home/anil/pig/pig-0.9.2/ivy/ivy-2.2.0.jar
[get] Not modified - so not downloaded
ivy-init-dirs:
ivy-probe-antlib:
ivy-init-antlib:
ivy-init:
ivy-buildJar:
[ivy:resolve] :: resolving dependencies :: org.apache.pig#Pig;0.9.3-SNAPSHOT
[ivy:resolve] confs: [buildJar]
[ivy:resolve] found com.sun.jersey#jersey-core;1.8 in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-core;1.0.0 in maven2
[ivy:resolve] found commons-cli#commons-cli;1.2 in maven2
[ivy:resolve] found xmlenc#xmlenc;0.52 in maven2
[ivy:resolve] found commons-httpclient#commons-httpclient;3.0.1 in maven2
[ivy:resolve] found commons-codec#commons-codec;1.4 in maven2
[ivy:resolve] found org.apache.commons#commons-math;2.1 in maven2
[ivy:resolve] found commons-configuration#commons-configuration;1.6 in maven2
[ivy:resolve] found commons-collections#commons-collections;3.2.1 in maven2
[ivy:resolve] found commons-lang#commons-lang;2.4 in maven2
[ivy:resolve] found commons-logging#commons-logging;1.1.1 in maven2
[ivy:resolve] found commons-digester#commons-digester;1.8 in maven2
[ivy:resolve] found commons-beanutils#commons-beanutils;1.7.0 in maven2
[ivy:resolve] found commons-beanutils#commons-beanutils-core;1.8.0 in maven2
[ivy:resolve] found commons-net#commons-net;1.4.1 in maven2
[ivy:resolve] found oro#oro;2.0.8 in maven2
[ivy:resolve] found org.mortbay.jetty#jetty;6.1.26 in maven2
[ivy:resolve] found org.mortbay.jetty#jetty-util;6.1.26 in maven2
[ivy:resolve] found org.mortbay.jetty#servlet-api;2.5-20081211 in maven2
[ivy:resolve] found tomcat#jasper-runtime;5.5.12 in maven2
[ivy:resolve] found tomcat#jasper-compiler;5.5.12 in maven2
[ivy:resolve] found org.mortbay.jetty#jsp-api-2.1;6.1.14 in maven2
[ivy:resolve] found org.mortbay.jetty#servlet-api-2.5;6.1.14 in maven2
[ivy:resolve] found org.mortbay.jetty#jsp-2.1;6.1.14 in maven2
[ivy:resolve] found org.eclipse.jdt#core;3.1.1 in maven2
[ivy:resolve] found ant#ant;1.6.5 in maven2
[ivy:resolve] found commons-el#commons-el;1.0 in maven2
[ivy:resolve] found net.java.dev.jets3t#jets3t;0.7.1 in maven2
[ivy:resolve] found net.sf.kosmosfs#kfs;0.3 in maven2
[ivy:resolve] found hsqldb#hsqldb;1.8.0.10 in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-test;1.0.0 in maven2
[ivy:resolve] found org.apache.ftpserver#ftplet-api;1.0.0 in maven2
[ivy:resolve] found org.apache.mina#mina-core;2.0.0-M5 in maven2
[ivy:resolve] found org.slf4j#slf4j-api;1.5.2 in maven2
[ivy:resolve] found org.apache.ftpserver#ftpserver-core;1.0.0 in maven2
[ivy:resolve] found org.apache.ftpserver#ftpserver-deprecated;1.0.0-M2 in maven2
[ivy:resolve] found log4j#log4j;1.2.16 in maven2
[ivy:resolve] found org.slf4j#slf4j-log4j12;1.6.1 in maven2
[ivy:resolve] found org.slf4j#slf4j-api;1.6.1 in maven2
[ivy:resolve] found org.apache.avro#avro;1.5.3 in maven2
[ivy:resolve] found com.googlecode.json-simple#json-simple;1.1 in maven2
[ivy:resolve] found com.jcraft#jsch;0.1.38 in maven2
[ivy:resolve] found jline#jline;0.9.94 in maven2
[ivy:resolve] found net.java.dev.javacc#javacc;4.2 in maven2
[ivy:resolve] found org.codehaus.jackson#jackson-mapper-asl;1.7.3 in maven2
[ivy:resolve] found org.codehaus.jackson#jackson-core-asl;1.7.3 in maven2
[ivy:resolve] found joda-time#joda-time;1.6 in maven2
[ivy:resolve] found com.google.guava#guava;11.0 in maven2
[ivy:resolve] found org.python#jython;2.5.0 in maven2
[ivy:resolve] found rhino#js;1.7R2 in maven2
[ivy:resolve] found org.antlr#antlr;3.4 in maven2
[ivy:resolve] found org.antlr#antlr-runtime;3.4 in maven2
[ivy:resolve] found org.antlr#stringtemplate;3.2.1 in maven2
[ivy:resolve] found antlr#antlr;2.7.7 in maven2
[ivy:resolve] found org.antlr#ST4;4.0.4 in maven2
[ivy:resolve] found org.apache.zookeeper#zookeeper;3.3.3 in maven2
[ivy:resolve] found org.jboss.netty#netty;3.2.2.Final in maven2
[ivy:resolve] found org.apache.hbase#hbase;0.90.0 in maven2
[ivy:resolve] found org.vafer#jdeb;0.8 in maven2
[ivy:resolve] found junit#junit;4.5 in maven2
[ivy:resolve] found org.apache.hive#hive-exec;0.8.0 in maven2
[ivy:resolve] downloading http://repo2.maven.org/maven2/junit/junit/4.5/junit-4.5.jar ...
[ivy:resolve] ....................................................................................................................................................................................................................................................................................................................................................................................... (194kB)
[ivy:resolve] ... (0kB)
[ivy:resolve] [SUCCESSFUL ] junit#junit;4.5!junit.jar (822ms)
[ivy:resolve] downloading http://repo2.maven.org/maven2/org/apache/hive/hive-exec/0.8.0/hive-exec-0.8.0.jar ...
[ivy:resolve] .............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. (3372kB)
[ivy:resolve] .. (0kB)
[ivy:resolve] [SUCCESSFUL ] org.apache.hive#hive-exec;0.8.0!hive-exec.jar (2262ms)
[ivy:resolve] :: resolution report :: resolve 9172ms :: artifacts dl 3114ms
[ivy:resolve] :: evicted modules:
[ivy:resolve] junit#junit;3.8.1 by [junit#junit;4.5] in [buildJar]
[ivy:resolve] commons-logging#commons-logging;1.0.3 by [commons-logging#commons-logging;1.1.1] in [buildJar]
[ivy:resolve] commons-codec#commons-codec;1.2 by [commons-codec#commons-codec;1.4] in [buildJar]
[ivy:resolve] commons-logging#commons-logging;1.1 by [commons-logging#commons-logging;1.1.1] in [buildJar]
[ivy:resolve] commons-codec#commons-codec;1.3 by [commons-codec#commons-codec;1.4] in [buildJar]
[ivy:resolve] commons-httpclient#commons-httpclient;3.1 by [commons-httpclient#commons-httpclient;3.0.1] in [buildJar]
[ivy:resolve] org.codehaus.jackson#jackson-mapper-asl;1.0.1 by [org.codehaus.jackson#jackson-mapper-asl;1.7.3] in [buildJar]
[ivy:resolve] org.slf4j#slf4j-api;1.5.2 by [org.slf4j#slf4j-api;1.6.1] in [buildJar]
[ivy:resolve] org.apache.mina#mina-core;2.0.0-M4 by [org.apache.mina#mina-core;2.0.0-M5] in [buildJar]
[ivy:resolve] org.apache.ftpserver#ftplet-api;1.0.0-M2 by [org.apache.ftpserver#ftplet-api;1.0.0] in [buildJar]
[ivy:resolve] org.apache.ftpserver#ftpserver-core;1.0.0-M2 by [org.apache.ftpserver#ftpserver-core;1.0.0] in [buildJar]
[ivy:resolve] org.apache.mina#mina-core;2.0.0-M2 by [org.apache.mina#mina-core;2.0.0-M5] in [buildJar]
[ivy:resolve] commons-cli#commons-cli;1.0 by [commons-cli#commons-cli;1.2] in [buildJar]
[ivy:resolve] org.antlr#antlr-runtime;3.3 by [org.antlr#antlr-runtime;3.4] in [buildJar]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| buildJar | 74 | 2 | 2 | 14 || 61 | 2 |
---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: org.apache.pig#Pig
[ivy:retrieve] confs: [buildJar]
[ivy:retrieve] 3 artifacts copied, 58 already retrieved (3855kB/20ms)
buildJar:
[echo] svnString exported
[jar] Building jar: /home/anil/hadoop/pig/pig-0.9.2/build/pig-0.9.3-SNAPSHOT-core.jar
[jar] Building jar: /home/anil/hadoop/pig/pig-0.9.2/build/pig-0.9.3-SNAPSHOT.jar
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
include-meta:
[copy] Copying 1 file to /home/anil/hadoop/pig/pig-0.9.2
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
jarWithOutSvn:
jar-withouthadoop:
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
jar-withouthadoopWithSvn:
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
ivy-download:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
[get] To: /home/anil/hadoop/pig/pig-0.9.2/ivy/ivy-2.2.0.jar
[get] Not modified - so not downloaded
ivy-init-dirs:
ivy-probe-antlib:
ivy-init-antlib:
ivy-init:
ivy-buildJar:
[ivy:resolve] :: resolving dependencies :: org.apache.pig#Pig;0.9.3-SNAPSHOT
[ivy:resolve] confs: [buildJar]
[ivy:resolve] found com.sun.jersey#jersey-core;1.8 in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-core;1.0.0 in maven2
[ivy:resolve] found commons-cli#commons-cli;1.2 in maven2
[ivy:resolve] found xmlenc#xmlenc;0.52 in maven2
[ivy:resolve] found commons-httpclient#commons-httpclient;3.0.1 in maven2
[ivy:resolve] found commons-codec#commons-codec;1.4 in maven2
[ivy:resolve] found org.apache.commons#commons-math;2.1 in maven2
[ivy:resolve] found commons-configuration#commons-configuration;1.6 in maven2
[ivy:resolve] found commons-collections#commons-collections;3.2.1 in maven2
[ivy:resolve] found commons-lang#commons-lang;2.4 in maven2
[ivy:resolve] found commons-logging#commons-logging;1.1.1 in maven2
[ivy:resolve] found commons-digester#commons-digester;1.8 in maven2
[ivy:resolve] found commons-beanutils#commons-beanutils;1.7.0 in maven2
[ivy:resolve] found commons-beanutils#commons-beanutils-core;1.8.0 in maven2
[ivy:resolve] found commons-net#commons-net;1.4.1 in maven2
[ivy:resolve] found oro#oro;2.0.8 in maven2
[ivy:resolve] found org.mortbay.jetty#jetty;6.1.26 in maven2
[ivy:resolve] found org.mortbay.jetty#jetty-util;6.1.26 in maven2
[ivy:resolve] found org.mortbay.jetty#servlet-api;2.5-20081211 in maven2
[ivy:resolve] found tomcat#jasper-runtime;5.5.12 in maven2
[ivy:resolve] found tomcat#jasper-compiler;5.5.12 in maven2
[ivy:resolve] found org.mortbay.jetty#jsp-api-2.1;6.1.14 in maven2
[ivy:resolve] found org.mortbay.jetty#servlet-api-2.5;6.1.14 in maven2
[ivy:resolve] found org.mortbay.jetty#jsp-2.1;6.1.14 in maven2
[ivy:resolve] found org.eclipse.jdt#core;3.1.1 in maven2
[ivy:resolve] found ant#ant;1.6.5 in maven2
[ivy:resolve] found commons-el#commons-el;1.0 in maven2
[ivy:resolve] found net.java.dev.jets3t#jets3t;0.7.1 in maven2
[ivy:resolve] found net.sf.kosmosfs#kfs;0.3 in maven2
[ivy:resolve] found hsqldb#hsqldb;1.8.0.10 in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-test;1.0.0 in maven2
[ivy:resolve] found org.apache.ftpserver#ftplet-api;1.0.0 in maven2
[ivy:resolve] found org.apache.mina#mina-core;2.0.0-M5 in maven2
[ivy:resolve] found org.slf4j#slf4j-api;1.5.2 in maven2
[ivy:resolve] found org.apache.ftpserver#ftpserver-core;1.0.0 in maven2
[ivy:resolve] found org.apache.ftpserver#ftpserver-deprecated;1.0.0-M2 in maven2
[ivy:resolve] found log4j#log4j;1.2.16 in maven2
[ivy:resolve] found org.slf4j#slf4j-log4j12;1.6.1 in maven2
[ivy:resolve] found org.slf4j#slf4j-api;1.6.1 in maven2
[ivy:resolve] found org.apache.avro#avro;1.5.3 in maven2
[ivy:resolve] found com.googlecode.json-simple#json-simple;1.1 in maven2
[ivy:resolve] found com.jcraft#jsch;0.1.38 in maven2
[ivy:resolve] found jline#jline;0.9.94 in maven2
[ivy:resolve] found net.java.dev.javacc#javacc;4.2 in maven2
[ivy:resolve] found org.codehaus.jackson#jackson-mapper-asl;1.7.3 in maven2
[ivy:resolve] found org.codehaus.jackson#jackson-core-asl;1.7.3 in maven2
[ivy:resolve] found joda-time#joda-time;1.6 in maven2
[ivy:resolve] found com.google.guava#guava;11.0 in maven2
[ivy:resolve] found org.python#jython;2.5.0 in maven2
[ivy:resolve] found rhino#js;1.7R2 in maven2
[ivy:resolve] found org.antlr#antlr;3.4 in maven2
[ivy:resolve] found org.antlr#antlr-runtime;3.4 in maven2
[ivy:resolve] found org.antlr#stringtemplate;3.2.1 in maven2
[ivy:resolve] found antlr#antlr;2.7.7 in maven2
[ivy:resolve] found org.antlr#ST4;4.0.4 in maven2
[ivy:resolve] found org.apache.zookeeper#zookeeper;3.3.3 in maven2
[ivy:resolve] found org.jboss.netty#netty;3.2.2.Final in maven2
[ivy:resolve] found org.apache.hbase#hbase;0.90.0 in maven2
[ivy:resolve] found org.vafer#jdeb;0.8 in maven2
[ivy:resolve] found junit#junit;4.5 in maven2
[ivy:resolve] found org.apache.hive#hive-exec;0.8.0 in maven2
[ivy:resolve] :: resolution report :: resolve 168ms :: artifacts dl 15ms
[ivy:resolve] :: evicted modules:
[ivy:resolve] junit#junit;3.8.1 by [junit#junit;4.5] in [buildJar]
[ivy:resolve] commons-logging#commons-logging;1.0.3 by [commons-logging#commons-logging;1.1.1] in [buildJar]
[ivy:resolve] commons-codec#commons-codec;1.2 by [commons-codec#commons-codec;1.4] in [buildJar]
[ivy:resolve] commons-logging#commons-logging;1.1 by [commons-logging#commons-logging;1.1.1] in [buildJar]
[ivy:resolve] commons-codec#commons-codec;1.3 by [commons-codec#commons-codec;1.4] in [buildJar]
[ivy:resolve] commons-httpclient#commons-httpclient;3.1 by [commons-httpclient#commons-httpclient;3.0.1] in [buildJar]
[ivy:resolve] org.codehaus.jackson#jackson-mapper-asl;1.0.1 by [org.codehaus.jackson#jackson-mapper-asl;1.7.3] in [buildJar]
[ivy:resolve] org.slf4j#slf4j-api;1.5.2 by [org.slf4j#slf4j-api;1.6.1] in [buildJar]
[ivy:resolve] org.apache.mina#mina-core;2.0.0-M4 by [org.apache.mina#mina-core;2.0.0-M5] in [buildJar]
[ivy:resolve] org.apache.ftpserver#ftplet-api;1.0.0-M2 by [org.apache.ftpserver#ftplet-api;1.0.0] in [buildJar]
[ivy:resolve] org.apache.ftpserver#ftpserver-core;1.0.0-M2 by [org.apache.ftpserver#ftpserver-core;1.0.0] in [buildJar]
[ivy:resolve] org.apache.mina#mina-core;2.0.0-M2 by [org.apache.mina#mina-core;2.0.0-M5] in [buildJar]
[ivy:resolve] commons-cli#commons-cli;1.0 by [commons-cli#commons-cli;1.2] in [buildJar]
[ivy:resolve] org.antlr#antlr-runtime;3.3 by [org.antlr#antlr-runtime;3.4] in [buildJar]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| buildJar | 74 | 0 | 0 | 14 || 61 | 0 |
---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: org.apache.pig#Pig
[ivy:retrieve] confs: [buildJar]
[ivy:retrieve] 0 artifacts copied, 61 already retrieved (0kB/9ms)
buildJar-withouthadoop:
[echo] svnString exported
[jar] Building jar: /home/anil/hadoop/pig/pig-0.9.2/build/pig-0.9.3-SNAPSHOT-withouthadoop.jar
[copy] Copying 1 file to /home/anil/hadoop/pig/pig-0.9.2
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
jar-withouthadoopWithOutSvn:
jar-all:
BUILD SUCCESSFUL
Total time: 5 minutes 38 seconds
==========================
Looks like Pig was build successfully. This step was needed to build piggybank.
Now go to the directory where piggybank resides.
=====================
anil@sadbhav:~/hadoop/pig/pig-0.9.2/contrib/piggybank/java$ ant
Buildfile: /home/anil/hadoop/pig/pig-0.9.2/contrib/piggybank/java/build.xml
init:
compile:
[echo] *** Compiling Pig UDFs ***
[javac] /home/anil/hadoop/pig/pig-0.9.2/contrib/piggybank/java/build.xml:92: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 153 source files to /home/anil/hadoop/pig/pig-0.9.2/contrib/piggybank/java/build/classes
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
jar:
[echo] *** Creating pigudf.jar ***
[jar] Building jar: /home/anil/hadoop/pig/pig-0.9.2/contrib/piggybank/java/piggybank.jar
BUILD SUCCESSFUL
Total time: 3 seconds
======================================
3) Now I have a directory to test my pig scripts.
Let us call it "anilpig".
I create the following pig script (distance.pig) which is a direct copy of what Alex has:
======================================
REGISTER /home/anil/hadoop/pig/pig-0.9.2/contrib/piggybank/java/piggybank.jar;
define radians org.apache.pig.piggybank.evaluation.math.toRadians();
define sin org.apache.pig.piggybank.evaluation.math.SIN();
define cos org.apache.pig.piggybank.evaluation.math.COS();
define sqrt org.apache.pig.piggybank.evaluation.math.SQRT();
define atan2 org.apache.pig.piggybank.evaluation.math.ATAN2();
geo = load 'haversine.csv' using PigStorage(';') as (id1: long, lat1: double, lon1: double);
geo2 = load 'haversine.csv' using PigStorage(';') as (id2: long, lat2: double, lon2: double);
geoCross = CROSS geo, geo2;
geoDist = FOREACH geoCross GENERATE id1, id2, 6371 * 2 * atan2(sqrt(sin(radians(lat2 - lat1) / 2) * sin(radians(lat2 - lat1) / 2) + cos(radians(lat1)) * cos(radians(lat2)) * sin(radians(lon2 - lon1) / 2) * sin(radians(lon2 - lon1) / 2)), sqrt(1 - (sin(radians(lat2 - lat1) / 2) * sin(radians(lat2 - lat1) / 2) + cos(radians(lat1)) * cos(radians(lat2)) * sin(radians(lon2 - lon1) / 2) * sin(radians(lon2 - lon1) / 2)))) as dist;
dump geoDist;
======================================
Please do not forget to update the path to piggybank.jar.
I also create the following haversine.csv file
===============
1;48.8583;2.2945
2;48.8738;2.295
================
4) Let us run pig to see if the values match what Alex quotes in his blog post.
==================
~/hadoop/pig/anilpig$ ../pig-0.9.2/bin/pig -x local distance.pig
which: no hadoop in (/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/bin:/usr/sbin:/usr/java/jdk1.6.0_30/bin:/opt/apache-maven-3.0.2/bin:/home/anil/.local/bin:/home/anil/bin:/usr/bin:/usr/sbin:/usr/java/jdk1.6.0_30/bin:/opt/apache-maven-3.0.2/bin)
2012-02-19 12:05:13,316 [main] INFO org.apache.pig.Main - Logging error messages to: /home/anil/hadoop/pig/anilpig/pig_1329674713314.log
2012-02-19 12:05:13,409 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2012-02-19 12:05:13,911 [main] WARN org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_DOUBLE 10 time(s).
2012-02-19 12:05:13,916 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: CROSS
2012-02-19 12:05:14,051 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-02-19 12:05:14,083 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage->POForEach to POJoinPackage
2012-02-19 12:05:14,090 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-02-19 12:05:14,090 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-02-19 12:05:14,108 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-02-19 12:05:14,114 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-02-19 12:05:14,133 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-02-19 12:05:14,148 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=66
2012-02-19 12:05:14,148 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-02-19 12:05:14,224 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-02-19 12:05:14,234 [Thread-2] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2012-02-19 12:05:14,239 [Thread-2] WARN org.apache.hadoop.mapred.JobClient - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
2012-02-19 12:05:14,315 [Thread-2] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-02-19 12:05:14,315 [Thread-2] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-02-19 12:05:14,323 [Thread-2] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-02-19 12:05:14,329 [Thread-2] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-02-19 12:05:14,329 [Thread-2] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-02-19 12:05:14,329 [Thread-2] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-02-19 12:05:14,562 [Thread-3] INFO org.apache.hadoop.util.ProcessTree - setsid exited with exit code 0
2012-02-19 12:05:14,564 [Thread-3] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@313816e0
2012-02-19 12:05:14,578 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2012-02-19 12:05:14,600 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2012-02-19 12:05:14,600 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2012-02-19 12:05:14,636 [Thread-3] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Created input record counter: Input records from _0_haversine.csv
2012-02-19 12:05:14,638 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
2012-02-19 12:05:14,643 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - Finished spill 0
2012-02-19 12:05:14,645 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
2012-02-19 12:05:14,725 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
2012-02-19 12:05:14,725 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-02-19 12:05:17,545 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner -
2012-02-19 12:05:17,546 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local_0001_m_000000_0' done.
2012-02-19 12:05:17,549 [Thread-3] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@36d83365
2012-02-19 12:05:17,551 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2012-02-19 12:05:17,572 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2012-02-19 12:05:17,572 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2012-02-19 12:05:17,591 [Thread-3] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Created input record counter: Input records from _1_haversine.csv
2012-02-19 12:05:17,592 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
2012-02-19 12:05:17,593 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - Finished spill 0
2012-02-19 12:05:17,594 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
2012-02-19 12:05:20,547 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner -
2012-02-19 12:05:20,548 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local_0001_m_000001_0' done.
2012-02-19 12:05:20,560 [Thread-3] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2f2e43f1
2012-02-19 12:05:20,560 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner -
2012-02-19 12:05:20,564 [Thread-3] INFO org.apache.hadoop.mapred.Merger - Merging 2 sorted segments
2012-02-19 12:05:20,568 [Thread-3] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 2 segments left of total size: 160 bytes
2012-02-19 12:05:20,568 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner -
2012-02-19 12:05:20,623 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
2012-02-19 12:05:20,623 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner -
2012-02-19 12:05:20,624 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task attempt_local_0001_r_000000_0 is allowed to commit now
2012-02-19 12:05:20,625 [Thread-3] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_r_000000_0' to file:/tmp/temp371866094/tmp-1622554263
2012-02-19 12:05:23,558 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2012-02-19 12:05:23,558 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local_0001_r_000000_0' done.
2012-02-19 12:05:24,730 [main] WARN org.apache.pig.tools.pigstats.PigStatsUtil - Failed to get RunningJob for job job_local_0001
2012-02-19 12:05:24,732 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-02-19 12:05:24,732 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats reported below may be incomplete
2012-02-19 12:05:24,734 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.0.0 0.9.3-SNAPSHOT anil 2012-02-19 12:05:14 2012-02-19 12:05:24 CROSS
Success!
Job Stats (time in seconds):
JobId Alias Feature Outputs
job_local_0001 geo,geo2,geoCross,geoDist file:/tmp/temp371866094/tmp-1622554263,
Input(s):
Successfully read records from: "file:///home/anil/hadoop/pig/anilpig/haversine.csv"
Successfully read records from: "file:///home/anil/hadoop/pig/anilpig/haversine.csv"
Output(s):
Successfully stored records in: "file:/tmp/temp371866094/tmp-1622554263"
Job DAG:
job_local_0001
2012-02-19 12:05:24,736 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2012-02-19 12:05:24,739 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-02-19 12:05:24,739 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(1,1,0.0)
(1,2,1.7239093620868347)
(2,1,1.7239093620868347)
(2,2,0.0)
===================
Pig has kicked out map reduce in the background.
How much time did this script take?
Let us look at the first log entry and the last one.
-------------------------
2012-02-19 12:05:13,316 [main] INFO org.apache.pig.Main - Logging error
2012-02-19 12:05:24,739 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
-------------------------
About 11 secs.
The run does show some stats:
---------------
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.0.0 0.9.3-SNAPSHOT anil 2012-02-19 12:05:14 2012-02-19 12:05:24 CROSS
---------------------
About 10 secs.
As you can see, the values (1.724) match with what Alex quotes. So I have been successful in testing the Haversine script from AlexP. Next step is to play with the script further to try out Pig's extended functionality.
Additional Details:
CROSS is described here. Computes the cross product of two or more relations.
References
http://fierydata.com/2012/05/11/hadoop-fundamentals-an-introduction-to-pig-2/
PLEASE DO NOT FORGET TO SEE MY POST: View
I tried to use pig to run Alex's pig script. He just gives the input values and the output from Pig along with the Pig script. There is no information on how to use Pig. That is fine. He just wants reader to go through the Pig manual. :)
Here is what I tried out:
1) Downloaded Apache Pig 0.9.2 (that was the latest version).
2) The script from Alex uses PiggyBank which is in Pig Contrib directory. Looks like I will have to build Pig.
=====================
pig_directory $> ant
...
[javacc] Java Compiler Compiler Version 4.2 (Parser Generator)
[javacc] (type "javacc" with no arguments for help)
...
[javacc] File "SimpleCharStream.java" is being rebuilt.
[javacc] Parser generated successfully.
prepare:
[mkdir] Created dir: xxx/pig-0.9.2/src-gen/org/apache/pig/parser
genLexer:
genParser:
genTreeParser:
gen:
compile:
[echo] *** Building Main Sources ***
[echo] *** To compile with all warnings enabled, supply -Dall.warnings=1 on command line ***
[echo] *** If all.warnings property is supplied, compile-sources-all-warnings target will be executed ***
[echo] *** Else, compile-sources (which only warns about deprecations) target will be executed ***
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
compile-sources:
[javac] xxx/pig-0.9.2/build.xml:429: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 667 source files to /home/anil/hadoop/pig/pig-0.9.2/build/classes
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[copy] Copying 1 file to xxx/pig-0.9.2/build/classes/org/apache/pig/tools/grunt
[copy] Copying 1 file to xxx/pig-0.9.2/build/classes/org/apache/pig/tools/grunt
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
compile-sources-all-warnings:
jar:
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
jarWithSvn:
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
ivy-download:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
[get] To: /home/anil/pig/pig-0.9.2/ivy/ivy-2.2.0.jar
[get] Not modified - so not downloaded
ivy-init-dirs:
ivy-probe-antlib:
ivy-init-antlib:
ivy-init:
ivy-buildJar:
[ivy:resolve] :: resolving dependencies :: org.apache.pig#Pig;0.9.3-SNAPSHOT
[ivy:resolve] confs: [buildJar]
[ivy:resolve] found com.sun.jersey#jersey-core;1.8 in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-core;1.0.0 in maven2
[ivy:resolve] found commons-cli#commons-cli;1.2 in maven2
[ivy:resolve] found xmlenc#xmlenc;0.52 in maven2
[ivy:resolve] found commons-httpclient#commons-httpclient;3.0.1 in maven2
[ivy:resolve] found commons-codec#commons-codec;1.4 in maven2
[ivy:resolve] found org.apache.commons#commons-math;2.1 in maven2
[ivy:resolve] found commons-configuration#commons-configuration;1.6 in maven2
[ivy:resolve] found commons-collections#commons-collections;3.2.1 in maven2
[ivy:resolve] found commons-lang#commons-lang;2.4 in maven2
[ivy:resolve] found commons-logging#commons-logging;1.1.1 in maven2
[ivy:resolve] found commons-digester#commons-digester;1.8 in maven2
[ivy:resolve] found commons-beanutils#commons-beanutils;1.7.0 in maven2
[ivy:resolve] found commons-beanutils#commons-beanutils-core;1.8.0 in maven2
[ivy:resolve] found commons-net#commons-net;1.4.1 in maven2
[ivy:resolve] found oro#oro;2.0.8 in maven2
[ivy:resolve] found org.mortbay.jetty#jetty;6.1.26 in maven2
[ivy:resolve] found org.mortbay.jetty#jetty-util;6.1.26 in maven2
[ivy:resolve] found org.mortbay.jetty#servlet-api;2.5-20081211 in maven2
[ivy:resolve] found tomcat#jasper-runtime;5.5.12 in maven2
[ivy:resolve] found tomcat#jasper-compiler;5.5.12 in maven2
[ivy:resolve] found org.mortbay.jetty#jsp-api-2.1;6.1.14 in maven2
[ivy:resolve] found org.mortbay.jetty#servlet-api-2.5;6.1.14 in maven2
[ivy:resolve] found org.mortbay.jetty#jsp-2.1;6.1.14 in maven2
[ivy:resolve] found org.eclipse.jdt#core;3.1.1 in maven2
[ivy:resolve] found ant#ant;1.6.5 in maven2
[ivy:resolve] found commons-el#commons-el;1.0 in maven2
[ivy:resolve] found net.java.dev.jets3t#jets3t;0.7.1 in maven2
[ivy:resolve] found net.sf.kosmosfs#kfs;0.3 in maven2
[ivy:resolve] found hsqldb#hsqldb;1.8.0.10 in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-test;1.0.0 in maven2
[ivy:resolve] found org.apache.ftpserver#ftplet-api;1.0.0 in maven2
[ivy:resolve] found org.apache.mina#mina-core;2.0.0-M5 in maven2
[ivy:resolve] found org.slf4j#slf4j-api;1.5.2 in maven2
[ivy:resolve] found org.apache.ftpserver#ftpserver-core;1.0.0 in maven2
[ivy:resolve] found org.apache.ftpserver#ftpserver-deprecated;1.0.0-M2 in maven2
[ivy:resolve] found log4j#log4j;1.2.16 in maven2
[ivy:resolve] found org.slf4j#slf4j-log4j12;1.6.1 in maven2
[ivy:resolve] found org.slf4j#slf4j-api;1.6.1 in maven2
[ivy:resolve] found org.apache.avro#avro;1.5.3 in maven2
[ivy:resolve] found com.googlecode.json-simple#json-simple;1.1 in maven2
[ivy:resolve] found com.jcraft#jsch;0.1.38 in maven2
[ivy:resolve] found jline#jline;0.9.94 in maven2
[ivy:resolve] found net.java.dev.javacc#javacc;4.2 in maven2
[ivy:resolve] found org.codehaus.jackson#jackson-mapper-asl;1.7.3 in maven2
[ivy:resolve] found org.codehaus.jackson#jackson-core-asl;1.7.3 in maven2
[ivy:resolve] found joda-time#joda-time;1.6 in maven2
[ivy:resolve] found com.google.guava#guava;11.0 in maven2
[ivy:resolve] found org.python#jython;2.5.0 in maven2
[ivy:resolve] found rhino#js;1.7R2 in maven2
[ivy:resolve] found org.antlr#antlr;3.4 in maven2
[ivy:resolve] found org.antlr#antlr-runtime;3.4 in maven2
[ivy:resolve] found org.antlr#stringtemplate;3.2.1 in maven2
[ivy:resolve] found antlr#antlr;2.7.7 in maven2
[ivy:resolve] found org.antlr#ST4;4.0.4 in maven2
[ivy:resolve] found org.apache.zookeeper#zookeeper;3.3.3 in maven2
[ivy:resolve] found org.jboss.netty#netty;3.2.2.Final in maven2
[ivy:resolve] found org.apache.hbase#hbase;0.90.0 in maven2
[ivy:resolve] found org.vafer#jdeb;0.8 in maven2
[ivy:resolve] found junit#junit;4.5 in maven2
[ivy:resolve] found org.apache.hive#hive-exec;0.8.0 in maven2
[ivy:resolve] downloading http://repo2.maven.org/maven2/junit/junit/4.5/junit-4.5.jar ...
[ivy:resolve] ....................................................................................................................................................................................................................................................................................................................................................................................... (194kB)
[ivy:resolve] ... (0kB)
[ivy:resolve] [SUCCESSFUL ] junit#junit;4.5!junit.jar (822ms)
[ivy:resolve] downloading http://repo2.maven.org/maven2/org/apache/hive/hive-exec/0.8.0/hive-exec-0.8.0.jar ...
[ivy:resolve] .............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. (3372kB)
[ivy:resolve] .. (0kB)
[ivy:resolve] [SUCCESSFUL ] org.apache.hive#hive-exec;0.8.0!hive-exec.jar (2262ms)
[ivy:resolve] :: resolution report :: resolve 9172ms :: artifacts dl 3114ms
[ivy:resolve] :: evicted modules:
[ivy:resolve] junit#junit;3.8.1 by [junit#junit;4.5] in [buildJar]
[ivy:resolve] commons-logging#commons-logging;1.0.3 by [commons-logging#commons-logging;1.1.1] in [buildJar]
[ivy:resolve] commons-codec#commons-codec;1.2 by [commons-codec#commons-codec;1.4] in [buildJar]
[ivy:resolve] commons-logging#commons-logging;1.1 by [commons-logging#commons-logging;1.1.1] in [buildJar]
[ivy:resolve] commons-codec#commons-codec;1.3 by [commons-codec#commons-codec;1.4] in [buildJar]
[ivy:resolve] commons-httpclient#commons-httpclient;3.1 by [commons-httpclient#commons-httpclient;3.0.1] in [buildJar]
[ivy:resolve] org.codehaus.jackson#jackson-mapper-asl;1.0.1 by [org.codehaus.jackson#jackson-mapper-asl;1.7.3] in [buildJar]
[ivy:resolve] org.slf4j#slf4j-api;1.5.2 by [org.slf4j#slf4j-api;1.6.1] in [buildJar]
[ivy:resolve] org.apache.mina#mina-core;2.0.0-M4 by [org.apache.mina#mina-core;2.0.0-M5] in [buildJar]
[ivy:resolve] org.apache.ftpserver#ftplet-api;1.0.0-M2 by [org.apache.ftpserver#ftplet-api;1.0.0] in [buildJar]
[ivy:resolve] org.apache.ftpserver#ftpserver-core;1.0.0-M2 by [org.apache.ftpserver#ftpserver-core;1.0.0] in [buildJar]
[ivy:resolve] org.apache.mina#mina-core;2.0.0-M2 by [org.apache.mina#mina-core;2.0.0-M5] in [buildJar]
[ivy:resolve] commons-cli#commons-cli;1.0 by [commons-cli#commons-cli;1.2] in [buildJar]
[ivy:resolve] org.antlr#antlr-runtime;3.3 by [org.antlr#antlr-runtime;3.4] in [buildJar]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| buildJar | 74 | 2 | 2 | 14 || 61 | 2 |
---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: org.apache.pig#Pig
[ivy:retrieve] confs: [buildJar]
[ivy:retrieve] 3 artifacts copied, 58 already retrieved (3855kB/20ms)
buildJar:
[echo] svnString exported
[jar] Building jar: /home/anil/hadoop/pig/pig-0.9.2/build/pig-0.9.3-SNAPSHOT-core.jar
[jar] Building jar: /home/anil/hadoop/pig/pig-0.9.2/build/pig-0.9.3-SNAPSHOT.jar
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
include-meta:
[copy] Copying 1 file to /home/anil/hadoop/pig/pig-0.9.2
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
jarWithOutSvn:
jar-withouthadoop:
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
jar-withouthadoopWithSvn:
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
ivy-download:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
[get] To: /home/anil/hadoop/pig/pig-0.9.2/ivy/ivy-2.2.0.jar
[get] Not modified - so not downloaded
ivy-init-dirs:
ivy-probe-antlib:
ivy-init-antlib:
ivy-init:
ivy-buildJar:
[ivy:resolve] :: resolving dependencies :: org.apache.pig#Pig;0.9.3-SNAPSHOT
[ivy:resolve] confs: [buildJar]
[ivy:resolve] found com.sun.jersey#jersey-core;1.8 in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-core;1.0.0 in maven2
[ivy:resolve] found commons-cli#commons-cli;1.2 in maven2
[ivy:resolve] found xmlenc#xmlenc;0.52 in maven2
[ivy:resolve] found commons-httpclient#commons-httpclient;3.0.1 in maven2
[ivy:resolve] found commons-codec#commons-codec;1.4 in maven2
[ivy:resolve] found org.apache.commons#commons-math;2.1 in maven2
[ivy:resolve] found commons-configuration#commons-configuration;1.6 in maven2
[ivy:resolve] found commons-collections#commons-collections;3.2.1 in maven2
[ivy:resolve] found commons-lang#commons-lang;2.4 in maven2
[ivy:resolve] found commons-logging#commons-logging;1.1.1 in maven2
[ivy:resolve] found commons-digester#commons-digester;1.8 in maven2
[ivy:resolve] found commons-beanutils#commons-beanutils;1.7.0 in maven2
[ivy:resolve] found commons-beanutils#commons-beanutils-core;1.8.0 in maven2
[ivy:resolve] found commons-net#commons-net;1.4.1 in maven2
[ivy:resolve] found oro#oro;2.0.8 in maven2
[ivy:resolve] found org.mortbay.jetty#jetty;6.1.26 in maven2
[ivy:resolve] found org.mortbay.jetty#jetty-util;6.1.26 in maven2
[ivy:resolve] found org.mortbay.jetty#servlet-api;2.5-20081211 in maven2
[ivy:resolve] found tomcat#jasper-runtime;5.5.12 in maven2
[ivy:resolve] found tomcat#jasper-compiler;5.5.12 in maven2
[ivy:resolve] found org.mortbay.jetty#jsp-api-2.1;6.1.14 in maven2
[ivy:resolve] found org.mortbay.jetty#servlet-api-2.5;6.1.14 in maven2
[ivy:resolve] found org.mortbay.jetty#jsp-2.1;6.1.14 in maven2
[ivy:resolve] found org.eclipse.jdt#core;3.1.1 in maven2
[ivy:resolve] found ant#ant;1.6.5 in maven2
[ivy:resolve] found commons-el#commons-el;1.0 in maven2
[ivy:resolve] found net.java.dev.jets3t#jets3t;0.7.1 in maven2
[ivy:resolve] found net.sf.kosmosfs#kfs;0.3 in maven2
[ivy:resolve] found hsqldb#hsqldb;1.8.0.10 in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-test;1.0.0 in maven2
[ivy:resolve] found org.apache.ftpserver#ftplet-api;1.0.0 in maven2
[ivy:resolve] found org.apache.mina#mina-core;2.0.0-M5 in maven2
[ivy:resolve] found org.slf4j#slf4j-api;1.5.2 in maven2
[ivy:resolve] found org.apache.ftpserver#ftpserver-core;1.0.0 in maven2
[ivy:resolve] found org.apache.ftpserver#ftpserver-deprecated;1.0.0-M2 in maven2
[ivy:resolve] found log4j#log4j;1.2.16 in maven2
[ivy:resolve] found org.slf4j#slf4j-log4j12;1.6.1 in maven2
[ivy:resolve] found org.slf4j#slf4j-api;1.6.1 in maven2
[ivy:resolve] found org.apache.avro#avro;1.5.3 in maven2
[ivy:resolve] found com.googlecode.json-simple#json-simple;1.1 in maven2
[ivy:resolve] found com.jcraft#jsch;0.1.38 in maven2
[ivy:resolve] found jline#jline;0.9.94 in maven2
[ivy:resolve] found net.java.dev.javacc#javacc;4.2 in maven2
[ivy:resolve] found org.codehaus.jackson#jackson-mapper-asl;1.7.3 in maven2
[ivy:resolve] found org.codehaus.jackson#jackson-core-asl;1.7.3 in maven2
[ivy:resolve] found joda-time#joda-time;1.6 in maven2
[ivy:resolve] found com.google.guava#guava;11.0 in maven2
[ivy:resolve] found org.python#jython;2.5.0 in maven2
[ivy:resolve] found rhino#js;1.7R2 in maven2
[ivy:resolve] found org.antlr#antlr;3.4 in maven2
[ivy:resolve] found org.antlr#antlr-runtime;3.4 in maven2
[ivy:resolve] found org.antlr#stringtemplate;3.2.1 in maven2
[ivy:resolve] found antlr#antlr;2.7.7 in maven2
[ivy:resolve] found org.antlr#ST4;4.0.4 in maven2
[ivy:resolve] found org.apache.zookeeper#zookeeper;3.3.3 in maven2
[ivy:resolve] found org.jboss.netty#netty;3.2.2.Final in maven2
[ivy:resolve] found org.apache.hbase#hbase;0.90.0 in maven2
[ivy:resolve] found org.vafer#jdeb;0.8 in maven2
[ivy:resolve] found junit#junit;4.5 in maven2
[ivy:resolve] found org.apache.hive#hive-exec;0.8.0 in maven2
[ivy:resolve] :: resolution report :: resolve 168ms :: artifacts dl 15ms
[ivy:resolve] :: evicted modules:
[ivy:resolve] junit#junit;3.8.1 by [junit#junit;4.5] in [buildJar]
[ivy:resolve] commons-logging#commons-logging;1.0.3 by [commons-logging#commons-logging;1.1.1] in [buildJar]
[ivy:resolve] commons-codec#commons-codec;1.2 by [commons-codec#commons-codec;1.4] in [buildJar]
[ivy:resolve] commons-logging#commons-logging;1.1 by [commons-logging#commons-logging;1.1.1] in [buildJar]
[ivy:resolve] commons-codec#commons-codec;1.3 by [commons-codec#commons-codec;1.4] in [buildJar]
[ivy:resolve] commons-httpclient#commons-httpclient;3.1 by [commons-httpclient#commons-httpclient;3.0.1] in [buildJar]
[ivy:resolve] org.codehaus.jackson#jackson-mapper-asl;1.0.1 by [org.codehaus.jackson#jackson-mapper-asl;1.7.3] in [buildJar]
[ivy:resolve] org.slf4j#slf4j-api;1.5.2 by [org.slf4j#slf4j-api;1.6.1] in [buildJar]
[ivy:resolve] org.apache.mina#mina-core;2.0.0-M4 by [org.apache.mina#mina-core;2.0.0-M5] in [buildJar]
[ivy:resolve] org.apache.ftpserver#ftplet-api;1.0.0-M2 by [org.apache.ftpserver#ftplet-api;1.0.0] in [buildJar]
[ivy:resolve] org.apache.ftpserver#ftpserver-core;1.0.0-M2 by [org.apache.ftpserver#ftpserver-core;1.0.0] in [buildJar]
[ivy:resolve] org.apache.mina#mina-core;2.0.0-M2 by [org.apache.mina#mina-core;2.0.0-M5] in [buildJar]
[ivy:resolve] commons-cli#commons-cli;1.0 by [commons-cli#commons-cli;1.2] in [buildJar]
[ivy:resolve] org.antlr#antlr-runtime;3.3 by [org.antlr#antlr-runtime;3.4] in [buildJar]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| buildJar | 74 | 0 | 0 | 14 || 61 | 0 |
---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: org.apache.pig#Pig
[ivy:retrieve] confs: [buildJar]
[ivy:retrieve] 0 artifacts copied, 61 already retrieved (0kB/9ms)
buildJar-withouthadoop:
[echo] svnString exported
[jar] Building jar: /home/anil/hadoop/pig/pig-0.9.2/build/pig-0.9.3-SNAPSHOT-withouthadoop.jar
[copy] Copying 1 file to /home/anil/hadoop/pig/pig-0.9.2
[taskdef] Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found.
jar-withouthadoopWithOutSvn:
jar-all:
BUILD SUCCESSFUL
Total time: 5 minutes 38 seconds
==========================
Looks like Pig was build successfully. This step was needed to build piggybank.
Now go to the directory where piggybank resides.
=====================
anil@sadbhav:~/hadoop/pig/pig-0.9.2/contrib/piggybank/java$ ant
Buildfile: /home/anil/hadoop/pig/pig-0.9.2/contrib/piggybank/java/build.xml
init:
compile:
[echo] *** Compiling Pig UDFs ***
[javac] /home/anil/hadoop/pig/pig-0.9.2/contrib/piggybank/java/build.xml:92: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 153 source files to /home/anil/hadoop/pig/pig-0.9.2/contrib/piggybank/java/build/classes
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
jar:
[echo] *** Creating pigudf.jar ***
[jar] Building jar: /home/anil/hadoop/pig/pig-0.9.2/contrib/piggybank/java/piggybank.jar
BUILD SUCCESSFUL
Total time: 3 seconds
======================================
3) Now I have a directory to test my pig scripts.
Let us call it "anilpig".
I create the following pig script (distance.pig) which is a direct copy of what Alex has:
======================================
REGISTER /home/anil/hadoop/pig/pig-0.9.2/contrib/piggybank/java/piggybank.jar;
define radians org.apache.pig.piggybank.evaluation.math.toRadians();
define sin org.apache.pig.piggybank.evaluation.math.SIN();
define cos org.apache.pig.piggybank.evaluation.math.COS();
define sqrt org.apache.pig.piggybank.evaluation.math.SQRT();
define atan2 org.apache.pig.piggybank.evaluation.math.ATAN2();
geo = load 'haversine.csv' using PigStorage(';') as (id1: long, lat1: double, lon1: double);
geo2 = load 'haversine.csv' using PigStorage(';') as (id2: long, lat2: double, lon2: double);
geoCross = CROSS geo, geo2;
geoDist = FOREACH geoCross GENERATE id1, id2, 6371 * 2 * atan2(sqrt(sin(radians(lat2 - lat1) / 2) * sin(radians(lat2 - lat1) / 2) + cos(radians(lat1)) * cos(radians(lat2)) * sin(radians(lon2 - lon1) / 2) * sin(radians(lon2 - lon1) / 2)), sqrt(1 - (sin(radians(lat2 - lat1) / 2) * sin(radians(lat2 - lat1) / 2) + cos(radians(lat1)) * cos(radians(lat2)) * sin(radians(lon2 - lon1) / 2) * sin(radians(lon2 - lon1) / 2)))) as dist;
dump geoDist;
======================================
Please do not forget to update the path to piggybank.jar.
I also create the following haversine.csv file
===============
1;48.8583;2.2945
2;48.8738;2.295
================
4) Let us run pig to see if the values match what Alex quotes in his blog post.
==================
~/hadoop/pig/anilpig$ ../pig-0.9.2/bin/pig -x local distance.pig
which: no hadoop in (/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/bin:/usr/sbin:/usr/java/jdk1.6.0_30/bin:/opt/apache-maven-3.0.2/bin:/home/anil/.local/bin:/home/anil/bin:/usr/bin:/usr/sbin:/usr/java/jdk1.6.0_30/bin:/opt/apache-maven-3.0.2/bin)
2012-02-19 12:05:13,316 [main] INFO org.apache.pig.Main - Logging error messages to: /home/anil/hadoop/pig/anilpig/pig_1329674713314.log
2012-02-19 12:05:13,409 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2012-02-19 12:05:13,911 [main] WARN org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_DOUBLE 10 time(s).
2012-02-19 12:05:13,916 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: CROSS
2012-02-19 12:05:14,051 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-02-19 12:05:14,083 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage->POForEach to POJoinPackage
2012-02-19 12:05:14,090 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-02-19 12:05:14,090 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-02-19 12:05:14,108 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-02-19 12:05:14,114 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-02-19 12:05:14,133 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-02-19 12:05:14,148 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=66
2012-02-19 12:05:14,148 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-02-19 12:05:14,224 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-02-19 12:05:14,234 [Thread-2] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2012-02-19 12:05:14,239 [Thread-2] WARN org.apache.hadoop.mapred.JobClient - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
2012-02-19 12:05:14,315 [Thread-2] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-02-19 12:05:14,315 [Thread-2] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-02-19 12:05:14,323 [Thread-2] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-02-19 12:05:14,329 [Thread-2] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-02-19 12:05:14,329 [Thread-2] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-02-19 12:05:14,329 [Thread-2] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-02-19 12:05:14,562 [Thread-3] INFO org.apache.hadoop.util.ProcessTree - setsid exited with exit code 0
2012-02-19 12:05:14,564 [Thread-3] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@313816e0
2012-02-19 12:05:14,578 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2012-02-19 12:05:14,600 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2012-02-19 12:05:14,600 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2012-02-19 12:05:14,636 [Thread-3] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Created input record counter: Input records from _0_haversine.csv
2012-02-19 12:05:14,638 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
2012-02-19 12:05:14,643 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - Finished spill 0
2012-02-19 12:05:14,645 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
2012-02-19 12:05:14,725 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
2012-02-19 12:05:14,725 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-02-19 12:05:17,545 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner -
2012-02-19 12:05:17,546 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local_0001_m_000000_0' done.
2012-02-19 12:05:17,549 [Thread-3] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@36d83365
2012-02-19 12:05:17,551 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2012-02-19 12:05:17,572 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2012-02-19 12:05:17,572 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2012-02-19 12:05:17,591 [Thread-3] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Created input record counter: Input records from _1_haversine.csv
2012-02-19 12:05:17,592 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
2012-02-19 12:05:17,593 [Thread-3] INFO org.apache.hadoop.mapred.MapTask - Finished spill 0
2012-02-19 12:05:17,594 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
2012-02-19 12:05:20,547 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner -
2012-02-19 12:05:20,548 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local_0001_m_000001_0' done.
2012-02-19 12:05:20,560 [Thread-3] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2f2e43f1
2012-02-19 12:05:20,560 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner -
2012-02-19 12:05:20,564 [Thread-3] INFO org.apache.hadoop.mapred.Merger - Merging 2 sorted segments
2012-02-19 12:05:20,568 [Thread-3] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 2 segments left of total size: 160 bytes
2012-02-19 12:05:20,568 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner -
2012-02-19 12:05:20,623 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
2012-02-19 12:05:20,623 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner -
2012-02-19 12:05:20,624 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task attempt_local_0001_r_000000_0 is allowed to commit now
2012-02-19 12:05:20,625 [Thread-3] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_r_000000_0' to file:/tmp/temp371866094/tmp-1622554263
2012-02-19 12:05:23,558 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2012-02-19 12:05:23,558 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local_0001_r_000000_0' done.
2012-02-19 12:05:24,730 [main] WARN org.apache.pig.tools.pigstats.PigStatsUtil - Failed to get RunningJob for job job_local_0001
2012-02-19 12:05:24,732 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-02-19 12:05:24,732 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats reported below may be incomplete
2012-02-19 12:05:24,734 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.0.0 0.9.3-SNAPSHOT anil 2012-02-19 12:05:14 2012-02-19 12:05:24 CROSS
Success!
Job Stats (time in seconds):
JobId Alias Feature Outputs
job_local_0001 geo,geo2,geoCross,geoDist file:/tmp/temp371866094/tmp-1622554263,
Input(s):
Successfully read records from: "file:///home/anil/hadoop/pig/anilpig/haversine.csv"
Successfully read records from: "file:///home/anil/hadoop/pig/anilpig/haversine.csv"
Output(s):
Successfully stored records in: "file:/tmp/temp371866094/tmp-1622554263"
Job DAG:
job_local_0001
2012-02-19 12:05:24,736 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2012-02-19 12:05:24,739 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-02-19 12:05:24,739 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(1,1,0.0)
(1,2,1.7239093620868347)
(2,1,1.7239093620868347)
(2,2,0.0)
===================
Pig has kicked out map reduce in the background.
How much time did this script take?
Let us look at the first log entry and the last one.
-------------------------
2012-02-19 12:05:13,316 [main] INFO org.apache.pig.Main - Logging error
2012-02-19 12:05:24,739 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
-------------------------
About 11 secs.
The run does show some stats:
---------------
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.0.0 0.9.3-SNAPSHOT anil 2012-02-19 12:05:14 2012-02-19 12:05:24 CROSS
---------------------
About 10 secs.
As you can see, the values (1.724) match with what Alex quotes. So I have been successful in testing the Haversine script from AlexP. Next step is to play with the script further to try out Pig's extended functionality.
Additional Details:
CROSS is described here. Computes the cross product of two or more relations.
References
http://fierydata.com/2012/05/11/hadoop-fundamentals-an-introduction-to-pig-2/
PLEASE DO NOT FORGET TO SEE MY POST: View
This comment has been removed by a blog administrator.
ReplyDelete