Zookeeper – Distributed cluster software
I read this article about Apache Zookeeper at Igvita and was intrigued.
I started looking around for ruby libraries, but nothing was as mature as I would like. I forked a branch on github and chugged along on getting the jruby and c versions to work with the same api.
Zookeeper exposes a super simple API, but with that simple stuff you can build a lot of complex cluster logic.
-
zk = ZooKeeper.new("localhost:2181", :watcher => :default) #this will handle events using my built-in event handler
-
zk2 = ZooKeeper.new("localhost:2181", :watcher => false) #this one won't receive watch events
-
-
zk.watcher.register("/mypath") do |event, zookeeper_client|
-
$stderr.puts("got an event on: #{event.path}")
-
end
-
-
zk.exists?("/mypath", :watch => true) # returns nil, but sets up the app to watch for the existence of /mypath
-
zk2.create("/mypath", "my data up to 1mb", :mode => :ephemeral)
-
# create modes can be any of
-
# :persistent_sequential, :ephemeral_sequential, :persistent, :ephemeral
-
-
# now the registered watcher will fire (at least within a few 100 miliseconds)
-
# because we set that node to be :ephemeral - when zk2 closes its connection, the "/mypath" will go away
-
# but watches are one-time firing only - so we need to set it up again
-
zk.exists?("/mypath", :watch => true) #returns true
-
-
zk2.close! #or delete or whatever
-
-
# the watcher fires again and
-
zk.exists?("/mypath") #returns false
A limited api of create, delete, get, set, watch lets you do some really advanced things around a cluster.
Examples
I added some abstractions based on the Zookeeper recipes.
Locks
-
#these 2 clients could be on totally separate boxes, different processes, whatever
-
zk = ZooKeeper.new("localhost:2181", :watcher => :default)
-
zk2 = ZooKeeper.new("localhost:2181", :watcher => :default)
-
-
lock1 = zk.locker("/mypath")
-
lock1.lock #true
-
-
lock2 = zk2.locker("/mypath")
-
lock2.lock #false
-
-
lock1.unlock #true
-
lock2.lock #true
-
-
# locks are also released on a client close/crash
-
lock1.lock #false
-
zk2.close!
-
lock1.lock #true
Message Queues
I also implemented a simple message queue on top of zookeeper. However, because of the way the zookeeper "children" calls are made (returning all children), I wouldn't recommend using this for queues where pending messages will reach into the thousands.
-
client1 = ZooKeeper.new("localhost:2181", :watcher => :default)
-
client2 = ZooKeeper.new("localhost:2181", :watcher => :default)
-
-
publisher = client1.queue("myqueue")
-
receiver = client2.queue("myqueue")
-
-
receiver.subscribe do |title, data|
-
# data will be whatever was published, title will be the node name
-
# for the message
-
-
$stderr.puts "got a message with: #{data}"
-
-
# having a true state from the block will mark the message as 'answered'
-
# sending back a false will requeue
-
-
true
-
end
More...
There's a ton you can do with this thing (priority queues, meta data store, etc). I think it's a nice addition to the ruby toolset.

Pingback: Delicious Bookmarks for March 31st from 01:12 to 01:48 « Lâmôlabs