This is an example for the deprecated BinaryMemtable bulk-load interface. Inserting data through the binary memtable, allows you to skip the commit log overhead, and an ack from Thrift on every insert. The example below utilizes Hadoop to generate all the data necessary to send to Cassandra, and sends it using the Binary Memtable interface. What Hadoop ends up doing is creating the actual data that gets put into an SSTable as if you were using Thrift. With enough Hadoop nodes inserting the data, the bottleneck at this point should become the network. We recommend adjusting the compaction threshold to 0 while the import is running. After the import, you need to run `nodeprobe -host flush_binary ` on every node, as this will flush the remaining data still left in memory to disk. Then it's recommended to adjust the compaction threshold to it's original value. The example in CassandraBulkLoader.java is a sample Hadoop job that inserts SuperColumns. It can be tweaked to work with normal Columns. You should construct your data you want to import as rows delimited by a new line. You end up grouping by in the mapper, so that the end result generates the data set into a column oriented subset. Once you get to the reduce aspect, you can generate the ColumnFamilies you want inserted, and send it to your nodes. For Cassandra 0.6.4, we modified this example to wait for acks from all Cassandra nodes for each row before proceeding to the next. This means to keep Cassandra similarly busy you can either 1) add more reducer tasks, 2) remove the "wait for acks" block of code, 3) parallelize the writing of rows to Cassandra, e.g. with an Executor. THIS CANNOT RUN ON THE SAME IP ADDRESS AS A CASSANDRA INSTANCE.