June 2013: S4 0.6.0 released, get it here!
S4 0.6.0 focuses on configurability and performance. More information is available in an overview and you may also check the release notes.

S4 is a general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.

motivation

S4 fills the gap between complex proprietary systems and batch-oriented open source computing platforms. We aim to develop a high performance computing platform that hides the complexity inherent in parallel processing system from the application programmer.

implementation

The core platform is written in Java. The implementation is modular and pluggable, and S4 applications can be easily and dynamically combined for creating more sophisticated stream processing systems.

open source

S4 was initially released by Yahoo! Inc. in October 2010 and is an Apache Incubator project since September 2011. It is licensed under the Apache 2.0 license.

overview

proven

S4 has been deployed in production systems at Yahoo! to process thousands of search queries per second.

decentralized

All nodes are symmetric with no centralized service and no single point of failure. This greatly simplifies deployments and cluster configuration changes.

scalable

Throughput increases linearly as additional nodes are added to the cluster. There is no predefined limit on the number of nodes that can be supported.

extensible

Applications can easily be written and deployed using a simple API. Building blocks of the platform (message queues and processors, serializer, checkpointing backend) can be replaced by custom implementations.

cluster management

S4 hides all cluster management tasks using a communication layer built on top of ZooKeeper, a distributed, open-source coordination service for distributed applications.

fault-tolerance

When a server in the cluster fails, a stand-by server is automatically activated to take over the tasks. Checkpointing and recovery minimize state loss.

disclaimer

Apache S4 is an effort undergoing incubation at the Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.