|
<<< Previous speaker
|
next speaker >>>
|
Namit Jain, Facebook
Namit Jain has been with the data-infrastructure group at Facebook for more
than a year. He is one of the early engineers for Hive, and is one of the
committers. He has presented Hive at a number of conferences, like Hadoop
Summit 2009, VLDB 2009 etc.
Before that, Namit was at Oracle for over 10 years in the database and
application server groups. He has worked on streaming technologies,
XML, replication, queuing and related products in and outside the database.
|
Presentation: "Facebook's Petabyte Scale Data Warehouse using Hive and Hadoop"
Time:
Friday 10:10 - 11:10
Location:
Metropolitan Ballroom
Abstract: Hive is an open source, peta-byte scale date warehousing framework built on top of Hadoop that enables scalable analytics on large data sets using SQL and some language extensions. Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and non-engineering. Apart from ad hoc analysis and business intelligence applications used by analysts across the company, a number of Facebook products are also based on analytics. These products range from simple reporting applications like Insights for the Facebook Ad Network, to more advanced kind such as Facebook's Lexicon product. As a result a flexible infrastructure that caters to the needs of these diverse applications and users and that also scales up in a cost effective manner with the ever increasing amounts of data being generated on Facebook, is critical. Hive fills that need and brings the power of Hadoop to users who are familar with SQL. It is flexible enough to understand different data formats (including custom formats) and also allows users to embedded cutom map/reduce logic or functions within a SQL like query. It is powerful enough to support many different kinds of analytics applications. In this presentation we will be talking in more detail about Hive, the motivations behind it and how it is used at Facebook to analyze and manage 400TB of compressed data (2.5PB of uncompressed) in our Hadoop cluster.
|
|
|