This should be a good tokenizer for most European-language documents: /// ///
Many applications have specific tokenizer needs. If this tokenizer does /// not suit your application, please consider copying this source code /// directory to your project and maintaining your own grammar-based tokenizer. ///
The returned token's type is set to an element of {@link /// StandardTokenizerConstants#tokenImage}. ///