So a few months ago I started looking into using Cassandra for a project. Not being fluent in Java or Python which have excellent client libraries for interfacing to Cassandra (Hector and Pycassa respectfully), I looked into the Perl and Ruby clients.
At the time, none of the native Perl or Ruby clients weren’t nearly as powerful, feature rich or as actively maintained as Hector/Pycassa. I started hacking on the Perl client, but a) realized it was going to need a lot of work and b) I really wanted to use Ruby on Rails as the front end for my application I started looking into using Hector with JRuby.
If you’re not familiar with JRuby, basically imagine Ruby running on top of the Java Virtual Machine (JVM). It’s actually a complete rewrite of the MRI standard (both 1.8 and 1.9 are supported) and most performance benchmarks have it much faster then most other Ruby implementations, including MRI which is in C. But on top of blazing performance (not to mention real threads!), JRuby also allows you to utilize Java classes as if they were native Ruby libraries, thus opening the door to use Hector with Ruby.
Anyways, you’re going to need the latest stable JRuby build, Sun JDK6, the Hector library and it’s dependancies. Basically install everything per their individual install directions. The only thing to make sure is that your JRuby install is using the same JDK as Hector is installed under.
At the top of your JRuby script:
# imports Java support into JRuby
require 'java'
# change these paths to point to your Java CLASSPATH
Dir["/usr/share/java/*.jar"].each{|jar| $CLASSPATH << jar }
Dir["/usr/share/java/slf4j/*log4j*.jar"].each{|jar| $CLASSPATH << jar }
# imports necessary for Hector/Cassandra
java_import 'java.util.Arrays'
java_import 'java.util.Iterator'
# Imports for Hector
java_import 'me.prettyprint.cassandra.serializers.StringSerializer'
java_import 'me.prettyprint.cassandra.serializers.BytesArraySerializer'
java_import 'me.prettyprint.cassandra.serializers.LongSerializer'
# add any other serializers you might need
java_import 'me.prettyprint.cassandra.service.CassandraHostConfigurator'
java_import 'me.prettyprint.hector.api.Cluster'
java_import 'me.prettyprint.hector.api.Keyspace'
java_import 'me.prettyprint.hector.api.beans.HColumn'
java_import 'me.prettyprint.hector.api.exceptions.HectorException'
java_import 'me.prettyprint.hector.api.factory.HFactory'
java_import 'me.prettyprint.hector.api.mutation.Mutator'
java_import 'me.prettyprint.hector.api.query.ColumnQuery'
java_import 'me.prettyprint.hector.api.query.QueryResult'
java_import 'me.prettyprint.hector.api.ConsistencyLevelPolicy'
java_import 'me.prettyprint.hector.api.HConsistencyLevel'
java_import 'org.apache.cassandra.thrift.ConsistencyLevel'
java_import 'me.prettyprint.cassandra.model.ConfigurableConsistencyLevel'
java_import 'me.prettyprint.cassandra.model.AbstractBasicQuery'
java_import 'me.prettyprint.cassandra.service.OperationType'
java_import 'me.prettyprint.cassandra.service.template.ColumnFamilyTemplate'
# You may need others...
java_import 'org.slf4j.LoggerFactory'
At this point, you can use Hector in your JRuby application, using the standard Ruby syntax:
# create some serializers- you'll need these later
@se = StringSerializer.get()
@le = LongSerializer.get()
# You'll want to set your read/write consistency level
cl = ConfigurableConsistencyLevel.new
read_level = get_cl(HConsistencyLevel::ONE)
write_level = get_cl(HConsistencyLevel::QUORUM)
cl.setDefaultReadConsistencyLevel(read_level)
cl.setDefaultWriteConsistencyLevel(write_level)
# Set some options for connecting to the cluster
hosts = CassandraHostConfigurator.new
hosts.setHosts('10.0.0.1,10.0.0.2')
hosts.setMaxActive(200)
hosts.setAutoDiscoverHosts(true)
hosts.setAutoDiscoveryDelayInSeconds(30)
hosts.setRetryDownedHostsDelayInSeconds(10)
hosts.setCassandraThriftSocketTimeout(15 * 1000) # 15sec converted to ms
hosts.setUseThriftFramedTransport(true)
# connect to the cluster
@cluster = HFactory.getOrCreateCluster('MyCluster', hosts)
# attach to a keyspace
@keyspace = HFactory.createKeyspace('MyKeyspace', @cluster, cl)
# create a mutator handle for reads/writes
@mutator = HFactory.createMutator(@keyspace, @se)
# insert some data into your CF, in this case rowkey, column names and values are all Ascii strings
cols = [
HFactory.createColumn("name", "Aaron", @se, @se),
HFactory.createColumn("userid", "synfinatic", @se, @se)
]
cols.each do |col|
@mutator.insert('rowkeyname', 'Users', col)
end
Ok, that was a pretty weak example, but hopefully it gives you an idea how things translate from Java to Ruby.
Some key things:
- For simple, short running scripts where execution time is short, consider using standard Ruby with the standard Cassanda gem since JRuby takes a lot more time to startup.
- You can’t just do: java_import ‘me.prettyprint.hector.api.*’ — you have to list each JAR individually. :(
- If you want high performance you’ll need to execute multiple queries at a time. Good news is that JRuby doesn’t have the global lock like MRI does and supports native threads for high performance.
- Hector is thread-safe, but each thread must use it’s own mutator handle!
- I haven’t tried using Hector’s Templates, because it wasn’t well enough documented when I started writing code and I ended up implementing my own wrapper classes on top of Hector which basically does the same thing… doh!
- You can use under_scored_method_names like in Java or camelCaseMethodNames like is the norm in Ruby- JRuby accepts either!
Where to go from here? Well check out the JRuby Wiki the Hector Wiki as well as the Cassandra High Performance Cookbook.