Sunday, January 27, 2013

Apache HBase - a simple tutorial

Apache HBase is a Column Database in the Hadoop ecosystem.  You can take a look at Apache HBase from its website at http://hbase.apache.org/


HBase Operations

Step 1: Download HBase

I downloaded hbase-0.94.4. This was the latest this day. You may get a later version.

Step 2: Unzip HBase

$> mkdir hbase
$> gunzip hbase-0.94.4.tar.gz
$> ls
hbase-0.94.4.tar

$> tar xvf hbase-0.94.4.tar

Now you should have a directory called hbase-0.94.4

$> cd hbase-0.94.4

Step 3:  Start HBase Daemon

$> cd bin
$> ./hbase-daemon.sh start master
starting master, logging to  .../hbase-0.94.4/bin/../logs/hbase-anil-master-2.local.out
$

Step 4:  Enter HBase Shell

$> ./hbase shell

HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.4, r1428173, Thu Jan  3 06:29:56 UTC 2013

hbase(main):001:0>


Step 5:  Create an HBase Table 


Table  will be called blog with a column family called "posts" and another column family called "images"


hbase(main):007:0> create 'blog', 'posts', 'images'
0 row(s) in 1.0610 seconds


Step 6: Populate the HBase Table

hbase(main):009:0> put 'blog','firstpost','posts:title','My HBase Post'
0 row(s) in 0.0220 seconds

hbase(main):009:0> put 'blog','firstpost','posts:title','My HBase Post'
0 row(s) in 0.0220 seconds

hbase(main):010:0> put 'blog','firstpost','posts:author','Anil'
0 row(s) in 0.0050 seconds

hbase(main):011:0> put 'blog','firstpost','posts:location','Chicago'
0 row(s) in 0.0070 seconds

hbase(main):012:0> put 'blog','firstpost','posts:content','HBase is cool'
0 row(s) in 0.0050 seconds

hbase(main):014:0> put 'blog','firstpost','images:header', 'first.jpg'
0 row(s) in 0.0060 seconds

hbase(main):015:0> put 'blog','firstpost','images:bodyimage', 'second.jpg'
0 row(s) in 0.0040 seconds



INFO ON HBASE CELL INSERTION FORMAT
NOTE:  Put a cell 'value' at specified table/row/column and optionally
timestamp coordinates.  To put a cell value into table 't1' at
row 'r1' under column 'c1' marked with the time 'ts1', do:
        hbase> put 't1', 'r1', 'c1', 'value', ts1



Step 7:  Verify the HBase Table Contents

hbase(main):016:0> get 'blog','firstpost'
COLUMN                CELL                                                    
 images:bodyimage     timestamp=1359347351382, value=second.jpg              
 images:header        timestamp=1359347324836, value=first.jpg                
 posts:author         timestamp=1359347197336, value=Anil                    
 posts:content        timestamp=1359347230734, value=HBase is cool            
 posts:location       timestamp=1359347210258, value=Chicago                  
 posts:title          timestamp=1359347161523, value=My HBase Post            
6 row(s) in 0.0350 seconds

hbase(main):017:0>


Cleaning Up

To delete the hbase table you created above, you need to first disable and then drop


hbase(main):005:0> disable 'blog'
0 row(s) in 2.0560 seconds

hbase(main):006:0> drop 'blog'
0 row(s) in 1.0560 seconds

Troubleshooting

If you make a mistake in the column name, you may see an error like this:

ERROR: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family image does not exist in region blog,,1359346963541.261ada3f5ada71f241759e6a062dc523. in table {NAME => 'blog', FAMILIES => [{NAME => 'images', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'posts', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}



HBase REST Server

If you are interested in starting the HBase Server as a REST Server,

Start the RegionServer

 ./hbase-daemon.sh start regionserver
starting regionserver, logging to hbase-0.94.4/bin/../logs/hbase-anil-regionserver-2.local.out
$

Start HBase REST Server

$ ./hbase-daemon.sh start rest -p 50000

NOTE:  You can use any port. I use 50000 for the rest server.

So when I go to http://localhost:50000
I see my hbase tables.

When I go to  http://localhost:50000/version
it gives me some version metadata info.

Stop HBase REST Server

$ ./hbase-daemon.sh stop rest -p 50000
stopping rest..

Stop HBase Master

$ ./hbase-daemon.sh stop master
stopping master.


4 comments:

  1. thanks this was veryhelpful.. appreciate you sharing..

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete