Package org.apache.nutch.parse.mspowerpoint

A Microsoft © PowerPoint document parsing plugin.

See:
          Description

Class Summary
FilteredStringWriter Writes to optimize ASCII output.
MSPowerPointParser Nutch-Parser for parsing MS PowerPoint slides ( mime type: application/vnd.ms-powerpoint).
 

Package org.apache.nutch.parse.mspowerpoint Description

A Microsoft © PowerPoint document parsing plugin.

This package relies on Jakarta POI.

Implementation based on sources found at Google Groups . It can also be found at http://www.mail-archive.com/poi-user@jakarta.apache.org/msg04809.html written by Hari Shanker and Sudhakar Chavali. Thanks for the basic work!

I changed these classes to support also Unicode content and optimized them for Nuch.



Copyright © 2006 The Apache Software Foundation