A Pig LoadFunc that reads all columns from a given ColumnFamily.

Setup:

First build and start a Cassandra server with the default
configuration* and set the PIG_HOME and JAVA_HOME environment
variables to the location of a Pig >= 0.7.0 install and your Java
install. If you would like to run using the Hadoop backend, you should
also set PIG_CONF_DIR to the location of your Hadoop config.

Run:

contrib/pig$ ant
contrib/pig$ bin/pig_cassandra -x local example-script.pig

This will run the test script against your Cassandra instance
and will assume that there is a Keyspace1/Standard1 with some
data in it. It will run in local mode (see pig docs for more info).

If you'd like to get to a 'grunt>' shell prompt, run:

contrib/pig$ bin/pig_cassandra -x local

Once the 'grunt>' shell has loaded, try a simple program like the
following, which will determine the top 50 column names:

grunt> rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
grunt> cols = FOREACH rows GENERATE flatten($1);
grunt> colnames = FOREACH cols GENERATE $0;
grunt> namegroups = GROUP colnames BY $0;
grunt> namecounts = FOREACH namegroups GENERATE COUNT($1), group;
grunt> orderednames = ORDER namecounts BY $0;
grunt> topnames = LIMIT orderednames 50;
grunt> dump topnames;

*If you want to point Pig at a real cluster, modify the seed
address in storage-conf.xml and re-run the build step.