LUCENE-2847: Support all of unicode, including supplementary code points above the basic multilingual plane, in StandardTokenizer and UAX29URLEmailTokenizer.