svn merge -c 1297274 from trunk to branch-0.23 FIXES HADOOP-8064. Remove unnecessary dependency on w3c.org in document processing (Khiwal Lee via bobby)