Shared Memory Layer for Apache Spark

Apache Ignite provides an implementation of Spark RDD abstraction which allows to easily share state in memory across multiple Spark jobs, either within the same application or between different Spark applications.

Shared RDDs

IgniteRDD is an implementation of native Spark RDD and DataFrame APIs which, in addition to all the standard RDD functionality, also shares the state of the RDD across other Spark jobs, applications and workers.

Depending on the pre-configured deployment mode, the shared state may either exist only during the lifespan of a Spark application (embedded mode), or it may out-survive the Spark application (standalone mode), in which case the state can be shared across multiple Spark applications.

In-Memory File System

One of unique capabilities of Ignite is a distributed in-memory file system called Ignite File System (IGFS). IGFS delivers similar functionality to Hadoop HDFS, but only in memory. In fact, in addition to its own APIs, IGFS implements Hadoop FileSystem API and can be transparently plugged into Hadoop or Spark deployments.

IGFS splits the data from each file into separate data blocks and stores them in a distributed in-memory cache. However, unlike Hadoop HDFS, IGFS does not need a name node and automatically determines file data locality using a hashing function.

Configure Shared Memory Layer