GSoC/GCI Archive
Google Summer of Code 2015 Apache Software Foundation

Apache Flink: Asynchronous Iterations and Updates

by Sachin Goel for Apache Software Foundation

Apache Flink provides fast data processing capabilities. However, to incorporate several Machine Learning algorithms in the ML library, which can at most be approximated only in a distributed setting, it becomes prudent to provide an excellent iteration framework. Furthermore, while processing large amounts of data, no resource should be wasted, and no node should sit idly while other are still finishing their work, to synchronize with them. Instead an asynchronous iteration framework is needed.