Apache Mahout > Mahout Wiki > Converting Content |
Mahout has some tools for converting content into formats more consumable for Mahout. While they shouldn't be confused as a full ETL layer, they can be useful for things like converting text files and log files. All of these can be accessed via the $MAHOUT_HOME/bin/mahout command line driver.
Useful for converting things like log files from one format to another. For instance, you could convert Solr log files containing query requests to a format consumable by FrequentItemsetMining
For example, the following will extract queries from HTTP request logs to Solr and prepare them for use by Frequent Itemset Mining.
bin/mahout regexconverter --input /Users/grantingersoll/projects/content/lucid/lucidfind/logs --output /tmp/solr/output --regex "(?<=(\?|&)q=).*?(?=&|$)" --overwrite --transformerClass url --formatterClass fpg