Hien.Luu

Processing Big Data with Apache Pig and Apache Hive
Location:
Seacliff B

Duration:
Full Day

Abstract:
To process big data and build big data products at rapid pace requires highly productive data processing technologies. Apache Pig and Apache Hive are designed to allow data scientists, data analysts, and engineers to be highly productive and to iterate quickly when performing data processing at massive scale. This tutorial will not only provide hands-on experience working with Apache Pig and Apache Hive, but will also provide a glimpse at how they are used at LinkedIn to build big data products. Here is what you can expect to learn in this tutorial: Quick high level overview of Hadoop data processing framework Understanding how Apache Pig & Apache Hive fit into Hadoop data processing ecosystem Overview of Apache Pig architecture and data flow language Overview of Apache Hive architecture and query language Demonstration of writing and running Apache Pig scripts Demonstration of writing and running Apache Hive queries Discussion about the strengths and weaknesses of Apache Pig and Apache Hive and when to use which Glimpse of upcoming query processing technologies & productivity tools like Netflix Lipstick Target Audience: Architects and engineers that have an interest in big data topics Participants should have some high level knowledge of Hadoop