Arun.Murthy

Apache Tez : Accelerating Hadoop Query Processing
Track: Hadoop : Beyond Map-Reduce

Location:
Grand Ballroom A

Abstract:
Apache Tez is a general purpose data processing framework written on top on YARN. Tez aims to provide high performance and efficiency out of the box across the spectrum of low latency queries and heavy-weight batch processing. Query plans produced by high-level languages like Hive and Pig can be elegantly translated via Tez's dataflow graph description API.

Adding new types of storage & data transfer technologies is facilitated via a flexible task construction model. A modular execution engine enables advanced optimization strategies to be plugged in at runtime for optimal execution. Early investments in Hive on Tez have shown remarkable improvements in performance. The talk will provide details about the design of Tez, use cases high-lighting the features and share some initial results obtained by Hive on Tez.

MapReduce has been the workhorse for Hadoop but its monolithic structure had made innovation slower. YARN separates resource management from application logic and thus enables the creation of Tez, a more flexible and generic new framework for data processing for the benefit of the entire Hadoop query ecosystem.