Qconn

Apache Giraph: Scalable Graph Processing on YARN

Apache Giraph: Scalable Graph Processing on YARN

Location: 
Grand Ballroom A
Time: 
Wednesday, 5:20pm - 6:10pm
Abstract: 

Apache Giraph performs offline batch processing of very large graph datasets on top of a Hadoop cluster. Giraph replaces iterative MapReduce-style solutions with Bulk Synchronous Parallel graph processing using in-memory or disk-based data sets, loosely following the model of Google`s Pregel. Robust, efficient, and fast, Giraph is now used in production to process massive graphs for companies like Facebook. Giraph's recent port to a pure YARN platform offers increased performance, fine-grained resource control, and scalability that Giraph atop Hadoop MRv1 cannot, while paving the way for ports to other platforms like Apache Mesos.   Come hear what's on the roadmap for Giraph as we explore the new possibilities YARN offers.

Eli.Reisman's picture
Eli Reisman is an open-source enthusiast and Apache Giraph committer and PMC member. He has been a frequent contributor to a variety of Apache projects in and around the Hadoop ecosystem since getting involved with Giraph at LinkedIn. While interning at Hortonworks, Eli ported Giraph to run on YARN. Eli is currently a software engineer at Etsy.com where he works with Hadoop and other open-source big data tools. Eli lives in Seattle, WA with his wife and a growing menagerie of ever-hungry rat terriers.