Apache Mahout > Mahout Wiki > File Format Integrations |
There are several importers and exporters for common file formats.
Run these with --help to see options
Some programs exist to dump text versions of SequenceFiles for perusal. Run these with --help to see options.
Note: all classes with a 'main' method can be used as a bin/mahout job name.
These are not main() classes and must be coded against.
Both of these formats are read by the Gephi program, an interactive graph explorer.
There are many file importers which are custom-made for particular algorithms:
For example, the following will extract queries from HTTP request logs to Solr and prepare them for use by Frequent Itemset Mining.
bin/mahout regexconverter --input /Users/grantingersoll/projects/content/lucid/lucidfind/logs --output /tmp/solr/output --regex "(?<=(\?|&)q=).*?(?=&|$)" --overwrite --transformerClass url --formatterClass fpg
See tutorial and cheat sheet for this marvelously opaque toolkit.