Package pyspark :: Package mllib :: Module random :: Class RandomRDDs
[frames] | no frames]

Class RandomRDDs

source code

Generator methods for creating RDDs comprised of i.i.d samples from some distribution.

Static Methods
uniformRDD(sc, size, numPartitions=None, seed=None)
Generates an RDD comprised of i.i.d.
source code
normalRDD(sc, size, numPartitions=None, seed=None)
Generates an RDD comprised of i.i.d.
source code
poissonRDD(sc, mean, size, numPartitions=None, seed=None)
Generates an RDD comprised of i.i.d.
source code
uniformVectorRDD(sc, numRows, numCols, numPartitions=None, seed=None)
Generates an RDD comprised of vectors containing i.i.d.
source code
normalVectorRDD(sc, numRows, numCols, numPartitions=None, seed=None)
Generates an RDD comprised of vectors containing i.i.d.
source code
poissonVectorRDD(sc, mean, numRows, numCols, numPartitions=None, seed=None)
Generates an RDD comprised of vectors containing i.i.d.
source code
Method Details

uniformRDD(sc, size, numPartitions=None, seed=None)
Static Method

source code 

Generates an RDD comprised of i.i.d. samples from the uniform distribution U(0.0, 1.0).

To transform the distribution in the generated RDD from U(0.0, 1.0) to U(a, b), use RandomRDDs.uniformRDD(sc, n, p, seed) .map(lambda v: a + (b - a) * v)

>>> x = RandomRDDs.uniformRDD(sc, 100).collect()
>>> len(x)
>>> max(x) <= 1.0 and min(x) >= 0.0
>>> RandomRDDs.uniformRDD(sc, 100, 4).getNumPartitions()
>>> parts = RandomRDDs.uniformRDD(sc, 100, seed=4).getNumPartitions()
>>> parts == sc.defaultParallelism

normalRDD(sc, size, numPartitions=None, seed=None)
Static Method

source code 

Generates an RDD comprised of i.i.d. samples from the standard normal distribution.

To transform the distribution in the generated RDD from standard normal to some other normal N(mean, sigma^2), use RandomRDDs.normal(sc, n, p, seed) .map(lambda v: mean + sigma * v)

>>> x = RandomRDDs.normalRDD(sc, 1000, seed=1L)
>>> stats = x.stats()
>>> stats.count()
>>> abs(stats.mean() - 0.0) < 0.1
>>> abs(stats.stdev() - 1.0) < 0.1

poissonRDD(sc, mean, size, numPartitions=None, seed=None)
Static Method

source code 

Generates an RDD comprised of i.i.d. samples from the Poisson distribution with the input mean.

>>> mean = 100.0
>>> x = RandomRDDs.poissonRDD(sc, mean, 1000, seed=1L)
>>> stats = x.stats()
>>> stats.count()
>>> abs(stats.mean() - mean) < 0.5
>>> from math import sqrt
>>> abs(stats.stdev() - sqrt(mean)) < 0.5

uniformVectorRDD(sc, numRows, numCols, numPartitions=None, seed=None)
Static Method

source code 

Generates an RDD comprised of vectors containing i.i.d. samples drawn from the uniform distribution U(0.0, 1.0).

>>> import numpy as np
>>> mat = np.matrix(RandomRDDs.uniformVectorRDD(sc, 10, 10).collect())
>>> mat.shape
(10, 10)
>>> mat.max() <= 1.0 and mat.min() >= 0.0
>>> RandomRDDs.uniformVectorRDD(sc, 10, 10, 4).getNumPartitions()

normalVectorRDD(sc, numRows, numCols, numPartitions=None, seed=None)
Static Method

source code 

Generates an RDD comprised of vectors containing i.i.d. samples drawn from the standard normal distribution.

>>> import numpy as np
>>> mat = np.matrix(RandomRDDs.normalVectorRDD(sc, 100, 100, seed=1L).collect())
>>> mat.shape
(100, 100)
>>> abs(mat.mean() - 0.0) < 0.1
>>> abs(mat.std() - 1.0) < 0.1

poissonVectorRDD(sc, mean, numRows, numCols, numPartitions=None, seed=None)
Static Method

source code 

Generates an RDD comprised of vectors containing i.i.d. samples drawn from the Poisson distribution with the input mean.

>>> import numpy as np
>>> mean = 100.0
>>> rdd = RandomRDDs.poissonVectorRDD(sc, mean, 100, 100, seed=1L)
>>> mat = np.mat(rdd.collect())
>>> mat.shape
(100, 100)
>>> abs(mat.mean() - mean) < 0.5
>>> from math import sqrt
>>> abs(mat.std() - sqrt(mean)) < 0.5