Track: Beyond Hadoop
Location:
- Seacliff C/D
Day of week:
- Wednesday
For many years now, Hadoop, the open-source combination of Map-Reduce libraries and the Hadoop Distributed File System (HDFS), has been the start and end of any conversation involving processing of large volumes of data. That has changed. Last year, we revealed that the Hadoop community was reducing its focus on the Map-Reduce paradigm in favor of a more flexible distributed system management paradigm known as YARN. YARN can support several data processing frameworks running side by side as applications (e.g. Map-Reduce, Storm, Tez) as well as many other types of frameworks, including those that support infrastructure beyond the purview of big data (e.g. web servers). In addition, several new file formats have emerged and HDFS is starting to take a back seat to new file systems based on SSD and memory! In short, the Big Data world we now live in has expanded beyond the borders of Hadoop -- it now includes several Interactive-speed OLAP engines, multiple machine learning platforms, a couple of columnar file formats, and many alternatives for both streaming and graph processing. Increasingly, sentences that begin with Hadoop often end with Spark. How are companies leveraging these new Big Data technologies? Come to this track to learn more.
by Matei Zaharia
CTO and founder of Databricks
While early big data systems, such as MapReduce, focused on batch processing, the demands on these systems have quickly grown. Users quickly needed to run (1) more interactive ad-hoc queries, (2) sophisticated multi-pass algorithms (e.g. machine learning), and (3) real-time stream processing. The result has been an explosion of specialized systems to tackle these new workloads. Unfortunately, this means more systems to learn, manage, and stitch together into pipelines. Spark is unique in...
by Richard Kasperowski
QCon Open Space Facilitator
Open Space
Join Jeff Magnusen, our speakers, and other attendees as we discuss more flexible distributed system management paradigms like YARN and new file systems based on SSD and memory! The Big Data world we now live in has expanded beyond the borders of Hadoop--it now includes several Interactive-speed OLAP engines, multiple machine learning platforms, a couple of columnar file formats, and many alternatives for both streaming and graph processing. Increasingly, sentences that begin with Hadoop...
by Eugene Mandel
Jawbone Data Science
At Jawbone, the Data Science team correlated step and workout data for hundreds of thousands of UP wearers with publicly available external datasets in order to understand how various factors affect physical activity.
In this talk we will highlight the challenges of combining internal and external datasets: knowing how the data was generated and its limitations, understanding the domain logic and, most importantly, addressing data errors and outliers.
We will also compare two...
by Lin Qiao
Engineering Manager at LinkedIn
Traditionally, a Big Data system is about the large sheer volume of datasets it handles and the large processing power behind it. Nowadays, It also means large data ingestion and integration with high velocity and high quality. While the first part of the big data problem has been the focus lately with innovations to tackle these challenges.
In reality, the latter part of the problem starts to cause big pain point a lot of times before developers get to solve the next problems. With...
by Julien Le Dem
Tech Lead at Twitter, Pig Committer, Co-author of Apache Parquet
Hadoop makes it relatively easy to store petabytes of data. However, storing data is not enough; it is important for a format to be queried quickly and efficiently. For interoperability, row based encodings (CSV, Thrift, Avro) combined with a general purpose compression algorithm to reduce storage cost (GZip, LZO, Snappy) are very common but are not efficient to query.
As discussed extensively in the database literature, a columnar layout with statistics on optionally sorted data...
by Gian Merlino
Engineer at Metamarkets
Hybrid batch/real-time architectures (sometimes called “lambda architectures”) are a powerful pattern for building robust, production-quality, up-to-the-minute data analytics systems.
We’ll discuss why you may want to go hybrid, the sorts of challenges that can arise when building production data systems, and effective techniques for making them easier to deploy and manage. We’ll take the data system at Metamarkets as an example, which uses Hadoop, Storm, Kafka, and Druid to ingest...
Tracks
Covering innovative topics
Monday, 3 November
-
Architectures You've Always Wondered about
The newest and biggest Internet architectures
-
Real World Functional
Putting functional programming concepts to work in the real world.
-
The Future of Mobile
The future of mobile and performance improvements
-
Continuous Delivery: From Heroics to Becoming Invisible
Continuous Delivery philosophies, cultures, hiccups, and best practices.
-
Unleashing the Power of Streaming Data
This track explores a variety of use-cases, platforms, and techniques for processing and analyzing stream data from the companies deploying them at scale!
-
Sponsored Solutions Track I
Tuesday, 4 November
-
Engineering for Product Success
Architectures that make products more successful
-
Reactive Service Architecture
Reactive, Responsive, Fault Tolerant and More.
-
Modern CS In the Real World
How modern CS tackles problems in the real world.
-
Applied Machine Learning and Data Science
Understand your big big data!
-
Deploying at Scale
Containerizing Applications, Discovering Services, and Deploying to the Grid.
-
Sponsored Solutions Track II
Wednesday, 5 November
-
Beyond Hadoop
Emerging Big Data Frameworks and Technology
-
Scalable Microservice Architectures
This track addresses the ways companies with hundreds of fine-grained web-services (e.g. Netflix, LinkedIn) manage complexity!
-
Java at the Cutting Edge
The latest and greatest in the Java ecosystem
-
Engineering culture
Successes and failures in creating an engineering culture.
-
Next gen HTML5 and JS
How Web Components, the Future of CSS, and more are changing the web.
-
Sponsored Solutions Track III