In-Memory File System
One of the unique capabilities of Apache Ignite is a distributed in-memory file system called Ignite File System (IGFS). IGFS delivers a similar functionality to Hadoop HDFS, but only in-memory. In fact, in addition to its own APIs, IGFS implements Hadoop FileSystem API and can be transparently plugged into Hadoop or Spark deployments.
IGFS splits the data from each file into separate data blocks and stores them in a distributed in-memory cache. However, unlike Hadoop HDFS, IGFS does not need a name node and automatically determines file data locality using a hashing function.
IGFS can be deployed stand alone, as well as on top of HDFS in which case it becomes a transparent caching layer for the files stored in HDFS.
IGFS integrates with native Apache Hadoop distribution, Cloudera CDH, and Hortonworks HDP.
IGFS can transparently replace Tachyon file system in Spark deployments. Given that IGFS is based on battle-tested Ignite data grid technology, it exhibits much better write and read performance than Tachyon, and is more stable.
See Hadoop integration documentation if you plan to use IGFS as Hadoop file system. In this case working with IGFS is no different than working with HDFS.
Also see IGFS native API examples available on GitHub.
Ignite File System Features
Feature | Description |
---|---|
On-Heap and Off-Heap |
IGFS allows to store files either on-heap or off-heap. For larger memory spaces it is critical to use off-heap to avoid JVM lengthy garbage collection pauses. |
IGFS as Hadoop FileSystem |
IGFS implements Hadoop |
Hadoop FileSystem Cache |
IGFS can also be deployed as a |
Any Hadoop Distribution |
IGFS integrates with native Apache Hadoop distribution, as well as Cloudera CDH, and Hortonworks HDP. |