LUCENE-3642: fix invalid offsets from CharTokenizer, [Edge]NGramFilters, SmartChinese, add sanity check to BaseTokenStreamTestCase