This directory contains a suite of scripts for placing continuous query and
ingest load on accumulo.  The purpose of these script is two fold. First,
place continuous load on accumulo to see if breaks.  Second, collect
statistics in order to understand how accumulo behaves .  To run these script
copy all of the .example files and modify them.  These scripts rely on pssh.
Before running any script you may need to use pssh to create the log directory on
each machine if you want it local.  Also, create the table "ci" before running.
You can run org.apache.accumulo.server.test.continuous.GenSplits to generate
splits points for a continuous ingest table.

The following ingest scripts inserts data into accumulo that will form a random
graph.

  start-ingest.sh
  stop-ingest.sh


The following query scripts randomly walk the graph created by the ingesters.
Each walker produce detailed statistics on query/scan times.

  start-walkers.sh
  stop-walker.sh

The following scripts start and stop batch walkers.

  start-batchwalkers.sh  
  stop-batchwalkers.sh

In addition to placing continuous load, the following scripts start and stop a
service that continually collect statistics about accumulo and HDFS. 

 start-stats.sh
 stop-stats.sh 

Optionally, start the agitator to periodically kill random servers.  

 start-agitator.sh
 stop-agitator.sh

Start all three of these services and let them run for a few hours. Then run
report.pl to generate an simple html report containing plots and histograms
showing what has transpired. 

A map reduce job to verify all data created by continuous ingest can be run
with the following command.  Before running the command modify the VERIFY_*
variables in continuous-env.sh if needed.  Do not run ingest while running this
command, this will cause erroneous reporting of UNDEFINED nodes. The map reduce
job will scan a reference after it has scanned the definition.

 run-verify.sh

Each entry, except for the first batch of entries, inserted by continuous
ingest references a previously flushed entry.  Since we are referencing flushed
entries, they should always exist.  The map reduce job checks that all
referenced entries exist.  If it finds any that do not exist it will increment
the UNDEFINED counter and emit the referenced but undefined node.  The map
reduce job produces two other counts : REFERENCED and UNREFERENCED.  It is
expected that these two counts are non zero.  REFERENCED counts nodes that are
defined and referenced.  UNREFERENCED counts nodes that defined and
unreferenced, these are the latest nodes inserted.

To stress accumulo, run the following script which starts a map reduce job
that reads and writes to your continuous ingest table.  This map reduce job
will write out an entry for every entry in the table (except for ones created
by the map reduce job itself). Stop ingest before running this map reduce job.
Do not run more than one instance of this map reduce job concurrently against a
table.

 run-moru.sh