Shared Apache Spark RDDs

Apache Ignite provides an implementation of Spark RDD abstraction which allows to easily share state in memory across multiple Spark jobs, either within the same application or between different Spark applications.

IgniteRDD is implemented is as a view over a distributed Ignite cache, which may be deployed either within the Spark job executing process, or on a Spark worker, or in its own cluster.

Depending on the pre-configured deployment mode, the shared state may either exist only during the lifespan of a Spark application (embedded mode), or it may out-survive the Spark application (standalone mode), in which case the state can be shared across multiple Spark applications.


Code Examples:
                            val sharedRdd = igniteContext.fromCache("partitioned")

                            // Store pairs of integers from 1 to 10000 into in-memory cache
                            // named "partitioned" using 10 parallel store operations.
                            sharedRdd.savePairs(sparkContext.parallelize(1 to 10000, 10).map(i => (i, i)))
                        
                            val sharedRdd = igniteContext.fromCache("partitioned")

                            val result = sharedRdd.sql(
                                "select _val from Integer where val > ? and val < ?", 10, 100)
                        

IgniteRDD Features

Feature Description
Shared Spark RDDs

IgniteRDD is an implementation of native Spark RDD and DataFrame APIs which, in addition to all the standard RDD functionality, also shares the state of the RDD across other Spark jobs, applications and workers.

Faster SQL

Spark does not support SQL indexes, while Ignite does. Because of advanced in-memory indexing capabilities, IgniteRDD allows to execute SQL queries 100s of times faster than Spark native RDDs or Data Frames.