LUCENE-4220: Remove the buggy JavaCC-based HTML parser in the benchmark module and replaced by NekoHTML