Data Loading and Streaming

Apache Ignite® data loading and streaming capabilities allow ingesting large finite as well as never-ending volumes of data in a scalable and fault-tolerant way into the cluster. The rate at which data can be injected into Ignite is very high and easily exceeds millions of events per second on a moderately sized cluster.

Data Loading and Streaming

Apache Ignite integrates with major streaming technologies and frameworks such as Kafka, Camel, Storm or JMS to bring even more advanced streaming capabilities to Ignite-based architectures.

Data Loading

Ignite provides several techniques for initial data loading. For instance, Ignite streaming APIs are a good choice for clusters with Ignite native persistence enabled, while the clusters that persist data in a 3rd party store can connect to it directly with CacheStore API.

How Ignite Streaming Works:

  1. Clients inject streams of data into Ignite.
  2. Data is automatically partitioned between Ignite data nodes.
  3. Data is concurrently processed across all cluster nodes.
  4. Clients perform concurrent SQL queries on the streamed data.
  5. Clients subscribe to continuous queries as data changes.

Code Examples:

                    // Get the data streamer reference and stream data.
                    try (IgniteDataStreamer<Integer, String> stmr = ignite.dataStreamer("myStreamCache")) {
                        // Stream entries.
                        for (int i = 0; i < 100000; i++)
                            stmr.addData(i, Integer.toString(i));
                    }
                
                    CacheConfiguration cfg = new CacheConfiguration("wordCountCache");

                    IgniteCache<Integer, Long> stmCache = ignite.getOrCreateCache(cfg);

                    try (IgniteDataStreamer<String, Long> stmr = ignite.dataStreamer(stmCache.getName())) {
                        // Allow data updates.
                        stmr.allowOverwrite(true);

                        // Configure data transformation to count instances of the same word.
                        stmr.receiver(StreamTransformer.from((e, arg) -> {
                        // Get current count.
                        Long val = e.getValue();

                        // Increment count by 1.
                        e.setValue(val == null ? 1L : val + 1);

                        return null;
                        }));

                        // Stream words into the streamer cache.
                        for (String word : text)
                        stmr.addData(word, 1L);
                    }
                

GitHub Examples:

Also see continuous queries, word count, and other streaming examples available on GitHub.

Streaming Features

Feature Description
Data Streamers

Data streamers are defined by IgniteDataStreamer API and are built to inject large amounts of continuous streams of data into Ignite stream caches. Data streamers are built in a scalable and fault-tolerant fashion and provide at-least-once-guarantee semantics for all the data streamed into Ignite.

Data Loading

Data streamers can be used to load large amounts of data into Ignite cahes. They can be used for initial data loading from a 3rd party database or another source.

Collocated Processing

For cases when you need to execute some custom logic instead of just adding new data, you can take advantage of StreamReceiver API.

Stream receivers allow you to react to the streamed data in collocated fashion, directly on the nodes where it will be cached. You can change the data or add any custom pre-processing logic to it, before putting the data into cache.

Continuous Queries

Continuous queries are useful for cases when you want to execute a query and then continue to get notified about the data changes that fall into your query filter.

JMS Data Streamer

Ignite JMS Data Streamer consumes messages from JMS brokers and inserts them into Ignite caches.

Apache Flume Sink

IgniteSink is a Flume sink that extracts events from an associated Flume channel and injects into an Ignite cache.

MQTT Streamer

Ignite MQTT Streamer consumes messages from a MQTT topic and feeds transformed key-value pairs into an Ignite cache.

Twitter Streamer

Ignite Twitter Streamer consumes messages from a Twitter Streaming API and inserts them into an Ignite cache.

Apache Kafka Streamer

Ignite Kafka Data Streamer consumes messages for a given Kafka Topic from Kafka Broker and inserts them into an Ignite cache.

Apache Camel streamer

Ignite Camel streamer consumes messages from an Apache Camel consumer endpoint and feeds them into an Ignite cache.

Apache Storm Streamer

Ignite Storm Streamer consumes messages from an Apache Storm consumer endpoint and feeds them into an Ignite cache.

Apache Flink Streamer

Ignite Flink Streamer consumes messages from an Apache Flink consumer endpoint and feeds them into an Ignite cache.

Apache RocketMQ Streamer

Ignite RocketMQ Streamer consumes messages from an Apache RocketMQ consumer endpoint and feeds them into an Ignite cache.

ZeroMQ Streamer

Ignite ZeroMQ Streamer consumes messages from a ZeroMQ consumer endpoint and feeds them into an Ignite cache.