Ordinary Least Squares Regression

Mahout-Samsara Algorithms

Linear Algebra

Preprocessors

Regression

Reccomenders

Map Reduce Algorithms (deprecated)

Classification

Clustering

About

The OrinaryLeastSquares regressor in Mahout implements a closed-form solution to Ordinary Least Squares. This is in stark contrast to many “big data machine learning” frameworks which implement a stochastic approach. From the users perspecive this difference can be reduced to:

Stochastic- A series of guesses at a line line of best fit.
Closed Form- A mathimatical approach has been explored, the properties of the parameters are well understood, and problems which arise (and the remedial measures), exist. This is usually the preferred choice of mathematicians/statisticians, but computational limititaions have forced us to resort to SGD.

Parameters

Parameter	Description	Default Value
`'calcCommonStatistics`	Calculate commons statistics such as Coeefficient of Determination and Mean Square Error	`true`
`'calcStandardErrors`	Calculate the standard errors (and subsequent "t-scores" and "p-values") of the \(\boldsymbol{\beta}\) estimates	`true`
`'addIntercept`	Add an intercept to \(\mathbf{X}\)	`true`

Example

In this example we disable the “calculate common statistics” parameters, so our summary will NOT contain the coefficient of determination (R-squared) or Mean Square Error

import org.apache.mahout.math.algorithms.regression.OrdinaryLeastSquares

val drmData = drmParallelize(dense(
      (2, 2, 10.5, 10, 29.509541),  // Apple Cinnamon Cheerios
      (1, 2, 12,   12, 18.042851),  // Cap'n'Crunch
      (1, 1, 12,   13, 22.736446),  // Cocoa Puffs
      (2, 1, 11,   13, 32.207582),  // Froot Loops
      (1, 2, 12,   11, 21.871292),  // Honey Graham Ohs
      (2, 1, 16,   8,  36.187559),  // Wheaties Honey Gold
      (6, 2, 17,   1,  50.764999),  // Cheerios
      (3, 2, 13,   7,  40.400208),  // Clusters
      (3, 3, 13,   4,  45.811716)), numPartitions = 2)


val drmX = drmData(::, 0 until 4)
val drmY = drmData(::, 4 until 5)

val model = new OrdinaryLeastSquares[Int]().fit(drmX, drmY, 'calcCommonStatistics → false)
println(model.summary)