Getting Started

On chasing the Stream Processing Utopia

Over the last 15 years batch processing frameworks have thrived and ruled over big data processing. But now in the age of social computing, it is no longer acceptable to wait for data to land into a data-lake before it gets processed. We want our applications to react to new data as soon as it gets generated upstream. For a web site, members expect their feed to be updated as soon as some relevant activity, news, jobs etc. happens. We are talking seconds (or minutes). We also want to detect degraded site experience, fraud, security breaches, spam etc. instantaneously. Even business metrics (written in traditionally batch oriented languages like HIVE/PIG) are now expected to run in realtime. The current status-quo of real-time data processing (stream processing) is still very far from Utopia.

Kartik Paramasivam, The Director of Engineering presented Chasing the Stream Utopia at Strange Loop ‘18. The talk was inspired by the extensive growth in Streaming Data at Linkedin, which has experienced a growth of as high as 5 Trillion Messages per day in 2018. Linkedin supports close to 3000 applications in production using Kafka and Samza. He shed further light on Samza’s claim as State of the art Stream Processing framework in the streaming world, supporting use cases at LinkedIn, Slack, Uber, Intuit etc

His talk described LinkedIn’s path on Chasing Utopia in Streaming world running apps at any complexity, any scale, any source, any language, and any environment! He shed light on all of the above with actual use cases from LinkedIn using Samza and Kafka in production. He touched Samza’s battle tested Stateful and Stateless processing, and also on the newer available features like event time based processing using Beam Runner for Samza and Samza SQL. He further briefly explained running and managing Kafka at Scale. Covering an array of topics from Kafka Cluster Management Woes to Dynamic Load Balancing using Kafka Cruise Control.

He further added the tooling ecosystem that supports these apps and streaming challanges that are faced at LinkedIn. He concluded with the upcoming releases and features of Samza (Apache Samza 1.0) and Kafka (Apache Kafka 2.0). Please find more here

Continue Reading