Shared Memory Layer for Apache Spark
Apache Ignite provides an implementation of Spark RDD abstraction which allows to easily share state in memory across multiple Spark jobs, either within the same application or between different Spark applications.
IgniteRDD
is an implementation of native Spark RDD
and DataFrame APIs which, in addition to all the standard RDD
functionality, also shares the state of the RDD across other
Spark jobs, applications and workers.
Depending on the pre-configured deployment mode, the shared state may either exist only
during the lifespan of a Spark application (embedded mode
), or it may out-survive
the Spark application (standalone mode
), in which case the state can be shared across
multiple Spark applications.
One of unique capabilities of Ignite is a distributed in-memory file system called Ignite File System (IGFS). IGFS delivers similar functionality to Hadoop HDFS, but only in memory. In fact, in addition to its own APIs, IGFS implements Hadoop FileSystem API and can be transparently plugged into Hadoop or Spark deployments.
IGFS splits the data from each file into separate data blocks and stores them in a distributed in-memory cache. However, unlike Hadoop HDFS, IGFS does not need a name node and automatically determines file data locality using a hashing function.