If you use CAS sso solution, then you can generate logs. I have a text file called "CASLoginLog.txt" This file is basically a snippet of login trail for 3 days in March. I have changed usernames and many things. So your file may look a bit different. :)
===================================
Date Action Username Service Ticket
28.3.2012 2:28:01 SERVICE_TICKET_CREATED user1 https://myurl ST-13133--org-sso
28.3.2012 2:27:30 SERVICE_TICKET_CREATED user2 https://myurl/url ST-13046--j-sso
28.3.2012 2:27:17 TICKET_GRANTING_TICKET_DESTROYED TGT-3380--j-sso
28.3.2012 2:27:17 SERVICE_TICKET_CREATED user3 https://c/thread/197282?tstart=0 ST-13045-j-sso
28.3.2012 2:27:16 TICKET_GRANTING_TICKET_CREATED firstlion TGT-3567--j-sso
28.3.2012 2:26:30 SERVICE_TICKET_CREATED user4 https://issues.j.org/secure/D.jspa ST-13044--j-sso
27.3.2012 23:12:37 SERVICE_TICKET_CREATED user2 https://c/thread/151832?start=15&tstart=0 ST-13048--j-sso
27.3.2012 22:51:51 SERVICE_TICKET_CREATED user5 https://c/login.jspa ST-13038--j-sso
27.3.2012 22:51:50 TICKET_GRANTING_TICKET_CREATED user5 TGT-3527--j-sso
27.3.2012 22:51:49 TICKET_GRANTING_TICKET_CREATED user5 TGT-3526--j-sso
26.3.2012 14:17:27 SERVICE_TICKET_CREATED user1 https://c/message/725882?tstart=0 ST-11709--j-sso
26.3.2012 13:02:51 TICKET_GRANTING_TICKET_CREATED user1 TGT-3223--j-sso
=======================================
So let us try to figure out, how many times in these 3 days, each user was provided a "SERVICE_TICKET_CREATED" action.
I am going to use Apache Pig to generate the output.
My pig script is called CASLog.pig
=====================================
file = LOAD 'CASLoginLog.txt' USING PigStorage(' ') AS (ticketDate: chararray,ticketTime: chararray,action: chararray,username: chararray,service: chararray,ticket: chararray);
trimmedfile = FOREACH file GENERATE TRIM(ticketDate) as ticketDate, TRIM(action) AS action, TRIM(username) AS username ,TRIM(ticket) AS ticket ;
selectedrows = FILTER trimmedfile BY action == 'SERVICE_TICKET_CREATED';
usersgroup = GROUP selectedrows BY username;
counts = FOREACH usersgroup GENERATE group AS username, COUNT(selectedrows) AS num_digits;
STORE counts INTO 'result' USING PigStorage('=');
==========================================
Now let me run Pig on this.
===========================================
$ sh ../pig-0.9.2/bin/pig -x local CASLog.pig
....
Input(s):
Successfully read records from: "file:///hadoop/pig/anilpig/CASLoginLog.txt"
Output(s):
Successfully stored records in: "file:///hadoop/pig/anilpig/result"
Job DAG:
job_local_0001
2012-03-30 16:56:09,762 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
============================================
Pig does the Map Reduce magic under the covers and stores the end result in a directory called "result" based on the last statement in the pig file.
========================
$ vi result/part-r-00000
user1=2
user2=2
user3=1
user4=1
user5=1
========================
It took me like a couple of hours to get the script correct and working, after a lot of trial and error. But I had to write 0 lines of Apache Hadoop Map Reduce java code.
Step: Generate a CAS SSO Login Trail
===================================
Date Action Username Service Ticket
28.3.2012 2:28:01 SERVICE_TICKET_CREATED user1 https://myurl ST-13133--org-sso
28.3.2012 2:27:30 SERVICE_TICKET_CREATED user2 https://myurl/url ST-13046--j-sso
28.3.2012 2:27:17 TICKET_GRANTING_TICKET_DESTROYED TGT-3380--j-sso
28.3.2012 2:27:17 SERVICE_TICKET_CREATED user3 https://c/thread/197282?tstart=0 ST-13045-j-sso
28.3.2012 2:27:16 TICKET_GRANTING_TICKET_CREATED firstlion TGT-3567--j-sso
28.3.2012 2:26:30 SERVICE_TICKET_CREATED user4 https://issues.j.org/secure/D.jspa ST-13044--j-sso
27.3.2012 23:12:37 SERVICE_TICKET_CREATED user2 https://c/thread/151832?start=15&tstart=0 ST-13048--j-sso
27.3.2012 22:51:51 SERVICE_TICKET_CREATED user5 https://c/login.jspa ST-13038--j-sso
27.3.2012 22:51:50 TICKET_GRANTING_TICKET_CREATED user5 TGT-3527--j-sso
27.3.2012 22:51:49 TICKET_GRANTING_TICKET_CREATED user5 TGT-3526--j-sso
26.3.2012 14:17:27 SERVICE_TICKET_CREATED user1 https://c/message/725882?tstart=0 ST-11709--j-sso
26.3.2012 13:02:51 TICKET_GRANTING_TICKET_CREATED user1 TGT-3223--j-sso
=======================================
So let us try to figure out, how many times in these 3 days, each user was provided a "SERVICE_TICKET_CREATED" action.
I am going to use Apache Pig to generate the output.
Step: Code a Pig Script
My pig script is called CASLog.pig
=====================================
file = LOAD 'CASLoginLog.txt' USING PigStorage(' ') AS (ticketDate: chararray,ticketTime: chararray,action: chararray,username: chararray,service: chararray,ticket: chararray);
trimmedfile = FOREACH file GENERATE TRIM(ticketDate) as ticketDate, TRIM(action) AS action, TRIM(username) AS username ,TRIM(ticket) AS ticket ;
selectedrows = FILTER trimmedfile BY action == 'SERVICE_TICKET_CREATED';
usersgroup = GROUP selectedrows BY username;
counts = FOREACH usersgroup GENERATE group AS username, COUNT(selectedrows) AS num_digits;
STORE counts INTO 'result' USING PigStorage('=');
==========================================
Step: Execute Apache Pig
Now let me run Pig on this.
===========================================
$ sh ../pig-0.9.2/bin/pig -x local CASLog.pig
....
Input(s):
Successfully read records from: "file:///hadoop/pig/anilpig/CASLoginLog.txt"
Output(s):
Successfully stored records in: "file:///hadoop/pig/anilpig/result"
Job DAG:
job_local_0001
2012-03-30 16:56:09,762 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
============================================
Pig does the Map Reduce magic under the covers and stores the end result in a directory called "result" based on the last statement in the pig file.
Step : View the results.
========================
$ vi result/part-r-00000
user1=2
user2=2
user3=1
user4=1
user5=1
========================
It took me like a couple of hours to get the script correct and working, after a lot of trial and error. But I had to write 0 lines of Apache Hadoop Map Reduce java code.