The OrinaryLeastSquares
regressor in Mahout implements a closed-form solution to Ordinary Least Squares.
This is in stark contrast to many “big data machine learning” frameworks which implement a stochastic approach. From the users perspecive this difference can be reduced to:
Parameter | Description | Default Value |
---|---|---|
'calcCommonStatistics |
Calculate commons statistics such as Coeefficient of Determination and Mean Square Error | true |
'calcStandardErrors |
Calculate the standard errors (and subsequent "t-scores" and "p-values") of the \(\boldsymbol{\beta}\) estimates | true |
'addIntercept |
Add an intercept to \(\mathbf{X}\) | true |
In this example we disable the “calculate common statistics” parameters, so our summary will NOT contain the coefficient of determination (R-squared) or Mean Square Error
import org.apache.mahout.math.algorithms.regression.OrdinaryLeastSquares
val drmData = drmParallelize(dense(
(2, 2, 10.5, 10, 29.509541), // Apple Cinnamon Cheerios
(1, 2, 12, 12, 18.042851), // Cap'n'Crunch
(1, 1, 12, 13, 22.736446), // Cocoa Puffs
(2, 1, 11, 13, 32.207582), // Froot Loops
(1, 2, 12, 11, 21.871292), // Honey Graham Ohs
(2, 1, 16, 8, 36.187559), // Wheaties Honey Gold
(6, 2, 17, 1, 50.764999), // Cheerios
(3, 2, 13, 7, 40.400208), // Clusters
(3, 3, 13, 4, 45.811716)), numPartitions = 2)
val drmX = drmData(::, 0 until 4)
val drmY = drmData(::, 4 until 5)
val model = new OrdinaryLeastSquares[Int]().fit(drmX, drmY, 'calcCommonStatistics → false)
println(model.summary)