StandardScaler
centers the values of each column to their mean, and scales them to unit variance.
scale
function in R-baseThe StandardScaler
is the equivelent of the R-base function scale
with
one noteable tweek. R’s scale
function (indeed all of R) calculates standard deviation with 1 degree of freedom, Mahout
(like many other statistical packages aimed at larger data sets) does not make this adjustment. In larger datasets the difference
is trivial, however when testing the function on smaller datasets the practicioner may be confused by the discrepency.
To verify this function against R on an arbitrary matrix, use the following form in R to “undo” the degrees of freedom correction.
N <- nrow(x)
scale(x, scale= apply(x, 2, sd) * sqrt(N-1/N))
StandardScaler
takes no parameters at this time.
import org.apache.mahout.math.algorithms.preprocessing.StandardScaler
val A = drmParallelize(dense(
(1, 1, 5),
(2, 5, -15),
(3, 9, -2)), numPartitions = 2)
val scaler: StandardScalerModel = new StandardScaler().fit(A)
val scaledA = scaler.transform(A)