Apache Giraph: Scalable Graph Processing on YARN
Track: Hadoop : Beyond Map-ReduceLocation:Grand Ballroom AAbstract:
Apache Giraph performs offline batch processing of very large graph datasets on top of a Hadoop cluster. Giraph replaces iterative MapReduce-style solutions with Bulk Synchronous Parallel graph processing using in-memory or disk-based data sets, loosely following the model of Google`s Pregel. Robust, efficient, and fast, Giraph is now used in production to process massive graphs for companies like Facebook. Giraph's recent port to a pure YARN platform offers increased performance, fine-grained resource control, and scalability that Giraph atop Hadoop MRv1 cannot, while paving the way for ports to other platforms like Apache Mesos. Come hear what's on the roadmap for Giraph as we explore the new possibilities YARN offers.