Track:

Traditionally, a Big Data system is about the large sheer volume of datasets it handles and the large processing power behind it. Nowadays, It also means large data ingestion and integration with high velocity and high quality. While the first part of the big data problem has been the focus lately with innovations to tackle these challenges.

In reality, the latter part of the problem starts to cause big pain point a lot of times before developers get to solve the next problems. With first hand experience on big data ingestion and integration pain points, we built Gobblin, a unified data ingestion framework to address the following challenges:

Source integration: The framework provides out-of-the-box adaptors for all our commonly accessed data sources such as Salesforce, MySQL, Google, Kafka and Databus, etc.
Processing paradigm: Support both standalone and scalable platforms, including Hadoop and Yarn. Integration with Yarn provides the ability to run scheduled batch ingest or continuous ingestion.
Data quality assurance: the framework exposes data metrics collectors and data quality checkers as first class citizens which can be used to power continuous data validation.
Extensibility: data pipeline developers can integrate their own adaptors with the framework, and make it leverage-able for other developers in the community.
Self-service: data pipeline developers can compose a data ingestion and transformation flow in the form of a DAG using a simple pipeline definition language or UI.

In this talk, we will cover Gobblin’s system architecture, key design decisions and tradeoffs, and lessons learned from operating disparate LinkedIn use cases in production.

Lin Qiao Elsewhere

Key contributor to Apache's Kafka & Samza

Neha Narkhede

Not Exactly! Fast Queries via Approximation Algorithms

Software Engineer at Metamarkets

Fangjin Yang

Not Exactly! Fast Queries via Approximation Algorithms

Quantitative analyst working on AdWords at Google

Nelson Ray

Tracks

Covering innovative topics

Monday, 3 November

Architectures You've Always Wondered about

The newest and biggest Internet architectures
Real World Functional

Putting functional programming concepts to work in the real world.
The Future of Mobile

The future of mobile and performance improvements
Continuous Delivery: From Heroics to Becoming Invisible

Continuous Delivery philosophies, cultures, hiccups, and best practices.
Unleashing the Power of Streaming Data

This track explores a variety of use-cases, platforms, and techniques for processing and analyzing stream data from the companies deploying them at scale!
Sponsored Solutions Track I

Tuesday, 4 November

Engineering for Product Success

Architectures that make products more successful
Reactive Service Architecture

Reactive, Responsive, Fault Tolerant and More.
Modern CS In the Real World

How modern CS tackles problems in the real world.
Applied Machine Learning and Data Science

Understand your big big data!
Deploying at Scale

Containerizing Applications, Discovering Services, and Deploying to the Grid.
Sponsored Solutions Track II

Wednesday, 5 November

Beyond Hadoop

Emerging Big Data Frameworks and Technology
Scalable Microservice Architectures

This track addresses the ways companies with hundreds of fine-grained web-services (e.g. Netflix, LinkedIn) manage complexity!
Java at the Cutting Edge

The latest and greatest in the Java ecosystem
Engineering culture

Successes and failures in creating an engineering culture.
Next gen HTML5 and JS

How Web Components, the Future of CSS, and more are changing the web.
Sponsored Solutions Track III

Tracks or Schedule

Find Lin Qiao at

Lin Qiao Elsewhere

Similar Talks

Tracks

Covering innovative topics

Monday, 3 November

Tuesday, 4 November

Wednesday, 5 November

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Gobblin: A Framework for Solving Big Data Ingestion Problem

Find Lin Qiao at

Lin Qiao Elsewhere

Similar Talks

Tracks

Covering innovative topics

Monday, 3 November

Tuesday, 4 November

Wednesday, 5 November

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World