Shared Apache Spark RDDs
Apache Ignite provides an implementation of Spark RDD abstraction which allows to easily share state in memory across multiple Spark jobs, either within the same application or between different Spark applications.
IgniteRDD
is implemented is as a view over a distributed Ignite cache,
which may be deployed either within the Spark job executing process, or on a Spark worker,
or in its own cluster.
Depending on the pre-configured deployment mode, the shared state may either exist only
during the lifespan of a Spark application (embedded mode
), or it may out-survive
the Spark application (standalone mode
), in which case the state can be shared across
multiple Spark applications.
val sharedRdd = igniteContext.fromCache("partitioned") // Store pairs of integers from 1 to 10000 into in-memory cache // named "partitioned" using 10 parallel store operations. sharedRdd.savePairs(sparkContext.parallelize(1 to 10000, 10).map(i => (i, i)))
val sharedRdd = igniteContext.fromCache("partitioned") val result = sharedRdd.sql( "select _val from Integer where val > ? and val < ?", 10, 100)
IgniteRDD Features
Feature | Description |
---|---|
Shared Spark RDDs |
|
Faster SQL |
Spark does not support SQL indexes, while Ignite does. Because of advanced in-memory indexing capabilities, IgniteRDD allows to execute SQL queries 100s of times faster than Spark native RDDs or Data Frames. |