Air Traffic Controller with Samza at LinkedIn
LinkedIn is a professional networking company that offers various services and platform for job seekers, employers and sales professionals. With a growing user base and multiple product offerings, it becomes imperative to streamline communications to members. To ensure member experience comes first, LinkedIn developed a new email and notifications platform called Air Traffic Controller (ATC).
ATC is designed to be an intelligent platform that tracks all outgoing communications and delivers the communication through the right channel to the right member at the right time.
Any service that wants to send out a notification to members writes its request to a Kafka topic, which ATC later reads from. The ATC platform comprises of three components:
Partitioners read incoming communication requests from Kafka and distribute them across Pipeline instances based on the hash of the recipient. It also does some
filtering early-on to drop malformed messages.
The Relevance processors read personalized machine-learning models from Kafka and stores them in Samza’s state store for evaluating them later. It uses them to score incoming requests and determine the right channel for the notification (eg: drop it vs sending an email vs push notification) .
The ATC pipeline processors aggregate the output from the Relevance and the Partitioners, thereby making the final call on the notification. It heavily leverages Samza’s local state to batch and aggregate notifications. It decides the frequency of notifications - duplicate notifications are merged, notifications are capped at a certain threshold. The Pipeline also implements a scheduler on top of Samza’s local-store so that it can schedule messages for delivery later. As an example, it may not be helpful to send a push-notification at midnight.
ATC uses several of Samza features:
1.Stateful processing: The ML models in the relevance module are stored locally in RocksDb and are updated realtime time based on user feedback.
2.Async APIs and Multi-threading: Samza’s multi-threading and Async APIs allow ATC to perform remote calls with high throughput. This helps bring down the 90th percentile end-to-end latency for push notifications.
3.Host affinity: Samza’s incremental checkpointing and host-affinity enable ATC to achieve zero downtime during upgrades and instant recovery during failures.
Key Samza Features: Stateful processing, Async API, Host affinity