REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
With the introduction of the YARN resource manager, it is now possible for Hadoop clusters to mix and match applications written for multiple computational frameworks. YARN achieves this by providing containers with an extremely low-level API---essentially a working directory and a command line---and expecting computational frameworks such as MapReduce to handle fault tolerance, communication, and the other trappings of scalable computations.
REEF is an Apache 2.0 licensed framework that bridges this gap by providing retainable hardware resources with lifetimes that are decoupled from those of computational tasks. This allows us to support high-performance iterative graph processing and machine learning algorithms, as well as sessions, which allow users to temporarily reserve a set of machines, instantiate a computational framework, and then run extremely low-latency ad-hoc queries and jobs atop the reserved machines.
Unlike existing approaches, REEF also aims for composability of jobs across computational models, providing significant performance and usability gains, even with legacy code. Finally, REEF provides a common set of mechanisms required by most scalable computational frameworks, including configuration management, scalable data-movement primitives, fault handling primitives, and support for advanced scheduling policies, such as preemption, and elastic job sizing.
In addition to providing first class support for Java-based applications and the Hadoop ecosystem, REEF provides a set of interoperability primitives that allow it to leverage systems written in native code and C#. This talk will cover REEF's core features, and present examples of computational frameworks, including interactive sessions, iterative graph processing, bulk synchronous computations, Hive queries, and, of course, MapReduce.