LUCENE-971: extract wikipedia documents as a doc maker directly from XML file without using intermediate one-file-per-document