public class JavaRDD<T>
extends java.lang.Object
Constructor and Description |
---|
JavaRDD(RDD<T> rdd,
scala.reflect.ClassTag<T> classTag) |
Modifier and Type | Method and Description |
---|---|
JavaRDD<T> |
cache()
Persist this RDD with the default storage level (`MEMORY_ONLY`).
|
scala.reflect.ClassTag<T> |
classTag() |
JavaRDD<T> |
coalesce(int numPartitions)
Return a new RDD that is reduced into
numPartitions partitions. |
JavaRDD<T> |
coalesce(int numPartitions,
boolean shuffle)
Return a new RDD that is reduced into
numPartitions partitions. |
JavaRDD<T> |
distinct()
Return a new RDD containing the distinct elements in this RDD.
|
JavaRDD<T> |
distinct(int numPartitions)
Return a new RDD containing the distinct elements in this RDD.
|
JavaRDD<T> |
filter(Function<T,java.lang.Boolean> f)
Return a new RDD containing only the elements that satisfy a predicate.
|
static <T> JavaRDD<T> |
fromRDD(RDD<T> rdd,
scala.reflect.ClassTag<T> evidence$1) |
JavaRDD<T> |
intersection(JavaRDD<T> other)
Return the intersection of this RDD and another one.
|
JavaRDD<T> |
persist(StorageLevel newLevel)
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
|
JavaRDD<T>[] |
randomSplit(double[] weights)
Randomly splits this RDD with the provided weights.
|
JavaRDD<T>[] |
randomSplit(double[] weights,
long seed)
Randomly splits this RDD with the provided weights.
|
RDD<T> |
rdd() |
JavaRDD<T> |
repartition(int numPartitions)
Return a new RDD that has exactly numPartitions partitions.
|
JavaRDD<T> |
sample(boolean withReplacement,
double fraction)
Return a sampled subset of this RDD.
|
JavaRDD<T> |
sample(boolean withReplacement,
double fraction,
long seed)
Return a sampled subset of this RDD.
|
JavaRDD<T> |
setName(java.lang.String name)
Assign a name to this RDD
|
<S> JavaRDD<T> |
sortBy(Function<T,S> f,
boolean ascending,
int numPartitions)
Return this RDD sorted by the given key function.
|
JavaRDD<T> |
subtract(JavaRDD<T> other)
Return an RDD with the elements from
this that are not in other . |
JavaRDD<T> |
subtract(JavaRDD<T> other,
int numPartitions)
Return an RDD with the elements from
this that are not in other . |
JavaRDD<T> |
subtract(JavaRDD<T> other,
Partitioner p)
Return an RDD with the elements from
this that are not in other . |
static <T> RDD<T> |
toRDD(JavaRDD<T> rdd) |
java.lang.String |
toString() |
JavaRDD<T> |
union(JavaRDD<T> other)
Return the union of this RDD and another one.
|
JavaRDD<T> |
unpersist()
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
|
JavaRDD<T> |
unpersist(boolean blocking)
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
|
JavaRDD<T> |
wrapRDD(RDD<T> rdd) |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
aggregate, cartesian, checkpoint, collect, collectAsync, collectPartitions, context, count, countApprox, countApprox, countApproxDistinct, countAsync, countByValue, countByValueApprox, countByValueApprox, first, flatMap, flatMapToDouble, flatMapToPair, fold, foreach, foreachAsync, foreachPartition, foreachPartitionAsync, getCheckpointFile, getNumPartitions, getStorageLevel, glom, groupBy, groupBy, id, isCheckpointed, isEmpty, iterator, keyBy, map, mapPartitions, mapPartitions, mapPartitionsToDouble, mapPartitionsToDouble, mapPartitionsToPair, mapPartitionsToPair, mapPartitionsWithIndex, mapToDouble, mapToPair, max, min, name, partitioner, partitions, pipe, pipe, pipe, reduce, saveAsObjectFile, saveAsTextFile, saveAsTextFile, splits, take, takeAsync, takeOrdered, takeOrdered, takeSample, takeSample, toArray, toDebugString, toLocalIterator, top, top, treeAggregate, treeAggregate, treeReduce, treeReduce, zip, zipPartitions, zipWithIndex, zipWithUniqueId
public scala.reflect.ClassTag<T> classTag()
public JavaRDD<T> persist(StorageLevel newLevel)
newLevel
- (undocumented)public JavaRDD<T> unpersist()
public JavaRDD<T> unpersist(boolean blocking)
blocking
- Whether to block until all blocks are deleted.public JavaRDD<T> distinct()
public JavaRDD<T> distinct(int numPartitions)
numPartitions
- (undocumented)public JavaRDD<T> filter(Function<T,java.lang.Boolean> f)
f
- (undocumented)public JavaRDD<T> coalesce(int numPartitions)
numPartitions
partitions.numPartitions
- (undocumented)public JavaRDD<T> coalesce(int numPartitions, boolean shuffle)
numPartitions
partitions.numPartitions
- (undocumented)shuffle
- (undocumented)public JavaRDD<T> repartition(int numPartitions)
Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data.
If you are decreasing the number of partitions in this RDD, consider using coalesce
,
which can avoid performing a shuffle.
numPartitions
- (undocumented)public JavaRDD<T> sample(boolean withReplacement, double fraction)
withReplacement
- can elements be sampled multiple times (replaced when sampled out)fraction
- expected size of the sample as a fraction of this RDD's size
without replacement: probability that each element is chosen; fraction must be [0, 1]
with replacement: expected number of times each element is chosen; fraction must be >= 0public JavaRDD<T> sample(boolean withReplacement, double fraction, long seed)
withReplacement
- can elements be sampled multiple times (replaced when sampled out)fraction
- expected size of the sample as a fraction of this RDD's size
without replacement: probability that each element is chosen; fraction must be [0, 1]
with replacement: expected number of times each element is chosen; fraction must be >= 0seed
- seed for the random number generatorpublic JavaRDD<T>[] randomSplit(double[] weights)
weights
- weights for splits, will be normalized if they don't sum to 1
public JavaRDD<T>[] randomSplit(double[] weights, long seed)
weights
- weights for splits, will be normalized if they don't sum to 1seed
- random seed
public JavaRDD<T> union(JavaRDD<T> other)
.distinct()
to eliminate them).other
- (undocumented)public JavaRDD<T> intersection(JavaRDD<T> other)
Note that this method performs a shuffle internally.
other
- (undocumented)public JavaRDD<T> subtract(JavaRDD<T> other)
this
that are not in other
.
Uses this
partitioner/partition size, because even if other
is huge, the resulting
RDD will be <= us.
other
- (undocumented)public JavaRDD<T> subtract(JavaRDD<T> other, int numPartitions)
this
that are not in other
.other
- (undocumented)numPartitions
- (undocumented)public JavaRDD<T> subtract(JavaRDD<T> other, Partitioner p)
this
that are not in other
.other
- (undocumented)p
- (undocumented)public java.lang.String toString()
toString
in class java.lang.Object