A Pig LoadFunc that reads all columns from a given ColumnFamily.

Setup:

First build and start a Cassandra server with the default
configuration* and set the PIG_HOME and JAVA_HOME environment
variables to the location of a Pig >= 0.7.0-dev install and your Java
install. If you would like to run using the Hadoop backend, you should
also set PIG_CONF_DIR to the location of your Hadoop config.

Run:

contrib/pig$ ant
contrib/pig$ bin/pig_cassandra

Once the 'grunt>' shell has loaded, try a simple program like the
following, which will determine the top 50 column names:

grunt> rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
grunt> cols = FOREACH rows GENERATE flatten($1);
grunt> colnames = FOREACH cols GENERATE $0;
grunt> namegroups = GROUP colnames BY $0;
grunt> namecounts = FOREACH namegroups GENERATE COUNT($1), group;
grunt> orderednames = ORDER namecounts BY $0;
grunt> topnames = LIMIT orderednames 50;
grunt> dump topnames;

*If you want to point Pig at a real cluster, modify the seed
address in storage-conf.xml and re-run the build step.