Hadoop : Beyond Map-Reduce
Hadoop, the open-source combination of Map-Reduce libraries and the Hadoop Distributed File System (HDFS) has long been an essential tool in any enterprise or startup. Data scientists use Hadoop to execute statistical and analytical functions on large volumes of data. Data Infrastructure and Search engineers use Hadoop to generate ready-to-load indexes for custom search engines, databases, and NoSQL systems. Data Warehouse engineers and analysts use Hadoop as a key-integration point for all data flowing into MPP databases like Teradata and reporting solutions like Microstrategy. Whatever the use-case, a Hadoop installation often finds several users in any business. Beyond Map-Reduce and HDFS, Hadoop acts as an ecosystem for several useful tools and languages, including Apache Pig and Apache Hive. Till recent months, all of these higher-level tools were forced to leverage Hadoop's Map-Reduce framework to do their work -- hence, everything at the higher level boiled down to one or more Map-Reduce jobs. This was not only inefficient, it also limited what framework developers and application developers could do on Hadoop. With the advent of YARN last year, things have dramatically changed. In recent months and weeks, several new frameworks have been introduced to leverage the power of YARN, including Tez, Samza, and REEF. Come learn about these and other exciting changes from the framework developers themselves!