LUCENE-3911: always use the same unicode block in the realistic case, sometimes use regexpish for lots of punctuation, fix off-by-one in randomRegexpIshString