How does Blur Work?

Blur leverages multiple open source projects including Hadoop, HDFS, Lucene, Thrift and Zookeeper to create an environment where structured data can be transformed into a sharded index that runs on a distributed (cloud) computing environment.

Using map/reduce jobs on a Hadoop data processing cluster, you can create custom data models and index them into Blur shards that are stored in HDFS. Once in HDFS, Blur tables are enabled, and Blur automatically distributes the index shards to the search cluster.

Through Blur configuration, you define where in the search cluster the Blur shard servers and controller servers live. Once configured, Blur figures out which shards are online and automatically distributes all of the index shards across the active shard servers. If a shard server dies, the other shard servers automatically compensate and identify where the additional load must be shifted. Having multiple controller servers both increases search throughput and provides redundancy at the controller layer.

Blur controllers coordinate all searches of the shard servers. They have a Thrift API that developers use to search and retrieve data from Blur.

How does Blur Work?

Blur Features