perf tests:
  - GRRR -- indexing MUCH slower now?
    trunk:
      Indexer: finished (960889 msec)
      Indexer: net bytes indexed 9635556306
      Indexer: 33.62065752061142 GB/hour plain text
    branch:
      Indexer: finished (1048065 msec)
      Indexer: net bytes indexed 9635556306
      Indexer: 30.824156883707392 GB/hour plain text

  - try *larger* maxItemsInBlock: could give better net perf?  ie less seeking and more scanning

what to do about short terms that "force" a block to mark itself as hasTerms!!??
  - maybe instead of "isLeafBlock" bit we encode "countUntilNextTerm"?  this way, for a block that only has empty string term we can stop scanning quickly?
  - maybe make a "terms block cache" that holds low-prefix LRU term blocks in ram...?
  - maybe a cache holding all short-length terms will be big perf boost?  it saves having to scan the low-depth blocks... or... maybe a bit noting whether this block contains any terms != empty string suffix; or, we separately hold all 'short'/'straggler' terms in a map, enabling the low-depth blocks to then 'lie' and say they have no terms?
  - hmm -- should I do something "special" for prefix terms?  ie short terms like 'a' that force a "fake" block (having only the one term 'a').  if i don't do something special, any time we seek a* we will have to scan this block?

try forcing no hasTerms if depth < 2?

LATER:
  - test if cutting over prefix query to .intersect is faster
  - maybe blocks should NOT store sub-block pointers?  it's reudundant w/ the index...
  - hmm: maybe switch PKLookupTask to intersect!?  do we have fast string builder?
  - hmm -- fix DOT when there are multiple outputs!?  oh, maybe not -- it just works?
  - maybe we should provide a "terms dict rewriter" tool?  ie can rewrite terms dict w/ new settings after segment was already created
  - intersect
    - can have a "allow terms out of order" mode... eg w/ the IntersectedTermsEnum?  that could be HUGE gain
  - would be nice to bake into FST outputs this ability to pack bits (ie multiple outputs) into a single long output... instead of app having to do its own packing
  - maybe: allow more bytes to be spent on index WITHOUT changing hte blocking?  ie add next-byte into index, but don't change the term blocks
    - ie, allow the index to "reach in" and index first/2nd/etc. bytes of the prefixes w/in a block?  ie, if block 'foo' has 22 entries, but they all start with either 'a' or 'e' then i can store safely fooa/fooe in the index, pointing to the same block
  - should we re-shuffle the blocks into "depth-first" order...?
  - if entire terms index shares a certain prefix (eg 0000) then optimize this case -- pull out a common prefix, once, so don't do arc-by-arc scan for that
    - ooh: for this case, instead of the "empty" block, we should store the 0000 block as the "root"
  - TERMS DICT should store min, max term, common prefix for fast NOT_FOUND case?
  - specialize the "onlyExact" case high up, so we don't sprinkle if's all throughout
  - must remove var gap terms index writer/reader
  - should we "align" our term dict blocks w/ disk blocks!?