Mahout Change Log Release 0.8 - unreleased MAHOUT-1272: Parallel SGD matrix factorizer for SVDrecommender (Peng Cheng via ssc) MAHOUT-1271: classify-20newsgroups.sh fails during the seqdirectory step (smarthi) MAHOUT-1269: Cleanup deprecated Lucene 3.x API calls in lucene2seq utility unit tests (smarthi) MAHOUT-833: Make conversion to sequence files map-reduce (Josh Patterson, smarthi) MAHOUT-1268: Wrong output directory for CVB (Mark Wicks via ssc) MAHOUT-1264: Performance optimizations in RecommenderJob (ssc) MAHOUT-1262: Cleanup LDA code (ssc) MAHOUT-1255: Fix for weights in Multinomial sometimes overflowing in BallKMeans (dfilimon) MAHOUT-1254: Final round of cleanup for StreamingKMeans (dfilimon) MAHOUT-1263: Serialise/Deserialise Lambda value for OnlineLogisticRegression (Mike Davy via smarthi) MAHOUT-1258: Another shot at findbugs and checkstyle (ssc) MAHOUT-1253: Add experiment tools for StreamingKMeans, part 1 (dfilimon) MAHOUT-884: Matrix Concatenate Utility (Lance Norskog via smarthi) MAHOUT-1250: Deprecate unused algorithms (ssc) MAHOUT-1251: Optimize MinHashMapper (ssc) MAHOUT-1211: Disabled swallowing of IOExceptions is Closeables.close for writers (dfilimon) MAHOUT-1164: Make ARFF integration generate meta-data in JSON format (Marty Kube via ssc) MAHOUT-1164: Make ARFF integration generate meta-data in JSON format (Marty Kube via ssc) MAHOUT-1163: Make random forest classifier meta-data file human readable (Marty Kube via ssc) MAHOUT-1243: Dictionary file format in Lucene-Mahout integration is not in SequenceFileFormat (ssc) MAHOUT-974: org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId (ssc) MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values) (Elena Smirnova via smarthi) MAHOUT-1237: Total cluster cost isn't computed properly (dfilimon) MAHOUT-1196: LogisticModelParameters uses csv.getTargetCategories() even if csv is not used. (Vineet Krishnan via ssc) MAHOUT-1224: Add the option of running a StreamingKMeans pass in the Reducer before BallKMeans (dfilimon) MAHOUT-993: Some vector dumper flags are expecting arguments. (Andrew Look via robinanil) MAHOUT-1228: Cleanup .gitignore (Stevo Slavic via ssc) MAHOUT-1047: CVB hangs after completion (Angel Martinez Gonzalez via smarthi) MAHOUT-1235: ParallelALSFactorizationJob does not use VectorSumCombiner (ssc) MAHOUT-1230: SparceMatrix.clone() is not deep copy (Maysam Yabandeh via tdunning) MAHOUT-1232: VectorHelper.topEntries() throws a NPE when number of NonZero elements in vector < maxEntries (smarthi) MAHOUT-1229: Conf directory content from Mahout distribution archives cannot be unpacked (Stevo Slavic via smarthi) MAHOUT-1213: SSVD job doesn't clean it's temp dir, and fails when seeing it again (smarthi) MAHOUT-1223: Fixed point skipped in StreamingKMeans when iterating through centroids from a reducer (dfilimon) MAHOUT-1222: Fix total weight in FastProjectionSearch (dfilimon) MAHOUT-1219: Remove LSHSearcher from StreamingKMeansTest. It causes it to sometimes fail (dfilimon) MAHOUT-1221: SparseMatrix.viewRow is sometimes readonly. (Maysam Yabandeh via smarthi) MAHOUT-1219: Remove LSHSearcher from SearchQualityTest. It causes it to fail, but the failure is not very meaningful (dfilimon) MAHOUT-1217: Nearest neighbor searchers sometimes fail to remove points: fix in FastProjectionSearch's searchFirst (dfilimon) MAHOUT-1216: Add locality sensitive hashing and a LocalitySensitiveHash searcher (dfilimon) MAHOUT-1181: Adding StreamingKMeans MapReduce classes (dfilimon) MAHOUT-1212: Incorrect classify-20newsgroups.sh file description (Julian Ortega via smarthi) MAHOUT-1209: DRY out maven-compiler-plugin configuration (Stevo Slavic via smarthi) MAHOUT-1207: Fix typos in description in parent pom (Stevo Slavic via smarthi) MAHOUT-1199: Improve javadoc comments of mahout-integration (Angel Martinez Gonzalez via smarthi) MAHOUT-1162: Adding BallKMeans and StreamingKMeans clustering algorithms (dfilimon) MAHOUT-1205: ParallelALSFactorizationJob should leverage the distributed cache (ssc) MAHOUT-1156: Adding nearest neighbor Searchers (dfilimon) MAHOUT-1202: Speed up Vector operations (dfilimon) MAHOUT-1155: Make MatrixSlice a Vector (and fix Centroid cloning; MAHOUT-1202) (dfilimon) MAHOUT-1189: CosineDistanceMeasure doesn't return 0 for two 0 vectors (dfilimon) MAHOUT-1180: Multinomial throws ConcurrentModificationException when iterating and setting probabilities (dfilimon) MAHOUT-1192: Speed up Vector Operations (robinanil) MAHOUT-1191: Cleanup Vector Benchmarks make it less variable (robinanil) MAHOUT-1190: SequentialAccessSparseVector function assignment is very slow and other iterator woes (robinanil) MAHOUT-1188: Inconsistent reference to Lucene versions in code and POM (smarthi) MAHOUT-1161: Unable to run CJKAnalyzer for conversion of a sequence file to sparse vector due to instantiation exception (ssc) MAHOUT-1187: Update Commons Lang to Commons Lang3 (smarthi) MAHOUT-1184 Another take at pmd, findbugs and checkstyle (ssc) MAHOUT-1182: Remove useless append (Dave Brosius via tdunning) MAHOUT-1176: Introduce a changelog file to raise contributors attribution (ssc) MAHOUT-1108: Allows cluster-reuters.sh example to be executed on a cluster (elmer.garduno via gsingers) MAHOUT-961: Fix issue in decision forest tree visualizer to properly show stems of tree (Ikumasa Mukai via gsingers) MAHOUT-944: Create SequenceFiles out of Lucene document storage (no term vectors required) (Frank Scholten, gsingers) MAHOUT-958: Fix issue with globs in RepresentativePointsDriver (Adam Baron, Vikram Dixit K, ehgjr via gsingers) MAHOUT-1084: Fixed issue with too many clusters in synthetic control example (liutengfei, gsingers) MAHOUT-1103: Fixed issue with splitting clusters on Hadoop (Matt Molek, gsingers) MAHOUT-1126: Filter out bad META-INF files in job packaging (Pat Ferrel, gsingers) MAHOUT-1211: Change deprecated Closeables.closeQuietly calls (smarthi, gsingers, srowen, dlyubimov)