March 2015 in the Flink community
March has been a busy month in the Flink community.
Flink runner for Google Cloud Dataflow
A Flink runner for Google Cloud Dataflow was announced. See the blog posts by data Artisans and the Google Cloud Platform Blog. Google Cloud Dataflow programs can be written using and open-source SDK and run in multiple backends, either as a managed service inside Google's infrastructure, or leveraging open source runners, including Apache Flink.
Learn about the internals of Flink
The community has started an effort to better document the internals of Flink. Check out the first articles on the Flink wiki on how Flink manages memory, how tasks in Flink exchange data, type extraction and serialization in Flink, as well as how Flink builds on Akka for distributed coordination.
Check out also the new blog post on how Flink executes joins with several insights into Flink's runtime.
Meetups and talks
Flink's machine learning efforts were presented at the Machine Learning Stockholm meetup group. The regular Berlin Flink meetup featured a talk on the past, present, and future of Flink. The talk is available on youtube.
In the Flink master
Table API in Scala and Java
The new Table API in Flink is now available in both Java and Scala. Check out the examples here (Java) and here (Scala).
Additions to the Machine Learning library
Flink's Machine Learning library is seeing quite a bit of traction. Recent additions include the CoCoA algorithm for distributed optimization.
Exactly-once delivery guarantees for streaming jobs
Flink streaming jobs now provide exactly once processing guarantees when coupled with persistent sources (notably Apache Kafka). Flink periodically checkpoints and persists the offsets of the sources and restarts from those checkpoints at failure recovery. This functionality is currently limited in that it does not yet handle large state and iterative programs.
Flink on Tez
A new execution environment enables non-iterative Flink jobs to use Tez as an execution backend instead of Flink's own network stack. Learn more here.