Monday, December 3, 2012

Impressions on Cloudera Impala

Today I attended a meet up arranged by the Chicago Big Data meet up at 6pm titled "Cloudera Impala". We were fortunate to have Marcel Kornacker, Lead Architect of Cloudera Impala as the presenter.  Marcel must have been pleasantly surprised to experience 70 Degree Fahrenheit weather in Chicago in December. It was one of those beautiful days, courtesy "Global Warming".

Speaker Impressions:-

My first impressions on Marcel were as follows:  he was unlike those speakers who do the pre-talk theatrics such as going around the room shaking hands or speaking loudly.  He was moving quietly (closer to the presentation area) or having silent conversations. So I deduced him to be a geeky dude, who does not seek conversations in the presentation room. So I thought it probably is not a marketing/technical evangelist person who will be shallow on the technical details of the presentation.

The other concern was whether he had an European accent that may be difficult to grasp, if he was too geeky.  Marcel started speaking.  As they say, do not judge a book by its cover, he drove away the accent issue and gave me the feeling that I will at least be able to hear what he is going to say. He speaks well and convincingly well on a topic where he is the subject matter expert.

Jonathan Seidman, organizer of the Chicago Big Data group, introduced Marcel as an ex-googler who has worked on the F1 database project in the past. I did not know what F1 was at Google. It sounded like important.  That was a good introduction to set the stage for Marcel. If he was employed at Google in a core database tech field. he should definitely know things well. As a presenter, Marcel did a good job discussing the objectives, intricacies, target areas and limitations of Impala. Kudos!

Impala Impressions :-

Let me get back to Impala. Marcel said that the code was written in C++. Bummer. As you know, Hadoop ecosystem is primarily Java (even though you have bits and pieces and tools that are non Java such as Hadoop Streaming). I guess Marcel knows C++ well. That is why he chose to write Impala in C++.  He mentioned that the interface of Impala for applications will be via ODBC. Ok, there is the first roadblock. I write Java code. Now if I want to be excited about Impala, I will need to look at some form of JDBC to ODBC bridge or wait for Marcel's team to code up some client utilities.  People tinkering with the Hadoop ecosystem may have the same questions/impressions as me.

While Hive exists for Java programmers to do SQL with Hadoop ecosystem, Marcel is trying to bring in C++ to the equation.  Here is the catch though.  Impala according to Marcel, performs 3 times better than Hive in certain situations. Wow, this can be a big thing.  But alas, we cannot use Impala via Java interfaces. So we are stuck with Hive (just remember Hives is bad allergy and not fun. :).  We are talking about Apache Hive), if we want to use SQL like interfaces into Hadoop.

I am sure there will be takers for Impala. I am not going to be doing any experimentation with it because I do not intend to a) use C++ or ODBC or b) use CDH4. My experiments are with Apache Hadoop community version and there are enough goodies to get excited about there. :)

Unlike Hive, Impala does not use Map Reduce underneath. It has Query Plans that get fragmented and distributed among the nodes in a cluster. There is a component that gathers the results of the plan execution. 

After the talk on my way back, I googled Marcel to learn more about him.  I hit on the following article that gives a very good background into Marcel.
http://www.wired.com/wiredenterprise/2012/10/kornacker-cloudera-google/
Basically Marcel is a details guy, with a PhD from Univ of Cal at Berkeley and is an excellent cook.

Cloudera Impala is in the hands of an excellent Chef.  Good Luck Marcel!

Other people such as http://java.sys-con.com/node/2461455 are getting excited about Impala.  Mention of "near real time" without the use of wind river or RTOS. :)


3 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete