The Whys and Hows of Database Streaming

Next QConSF Conference: Applied AI for Developers QCon.ai April 2019

Abstract

Batch-style ETL pipelines have been the de facto method for getting data from OLTP to OLAP database systems for a long time. At WePay, when we first built our data pipeline from MySQL to BigQuery, we adopted this tried-and-true approach. However, as our company scaled and our business needs grew, we observed a stronger demand for making data available for analytics in real-time. This led us to redesign our pipeline to a streaming-based approach using open-source technologies such as Debezium and Kafka.

This talk goes over the central design pattern around database streaming, change data capture (CDC), and what its advantages are over alternative approaches like trigger or event-sourcing. To solidify the concept, we will go through our MySQL-to-BigQuery streaming pipeline in detail, explaining the core components involved, and how we built this pipeline to be resilient to failure. Finally, we will expand on some of our on-going work around the additional challenges we face when streaming peer-to-peer distributed databases (i.e. Cassandra), and what some potential solutions around it are.

Speaker: Joy Gao

Sr. Software Engineer @WePay

Joy is a senior software engineer at WePay. She works on the data infrastructure team, building streaming and batch data pipelines with open source software. She is a FOSS enthusiast and a committer for Apache-Airflow.

Find Joy Gao at

Speaker page

@joygao

Director of Developer Evangelism @fauna

Chris Anderson

Reducing Risk of Credential Compromise @Netflix

Security Researcher, Leader, Advisor @Netflix

William Bengtson

Reducing Risk of Credential Compromise @Netflix

Sr. Cloud Security Engineer @Netflix

Travis McPeak

Taking the Canary Out of the Coal Mine

Staff Security Engineer @Cruise Automation

Mike Ruth

Using Data to Measure Risk in Cyber Systems

Director of Cyber Risk @QadiumInc

Marshall Kuypers

Security & Psychology: Demotivating Persistent Threats

Engineering Director @ShapeSecurity & JavaScript Expert

Jarrod Overson

Fairness, Transparency, and Privacy in AI @LinkedIn

Tech Lead Fairness, Transparency, Explainability & Privacy Efforts @LinkedIn

Krishnaram Kenthapadi

Jupyter Notebooks: Interactive Visualization Approaches

Senior Researcher in the Quantitative Financial Research Group @Bloomberg

Chakri Cherukuri

Nearline Recommendations for Active Communities @LinkedIn

Senior Manager & Heading AI for Growth and Communication Relevance @LinkedIn

Hema Raghavan

Tracks

Monday, 5 November

Microservices / Serverless Patterns & Practices

Evolving, observing, persisting, and building modern microservices
Practices of DevOps & Lean Thinking

Practical approaches using DevOps & Lean Thinking
JavaScript & Web Tech

Beyond JavaScript in the Browser. Exploring WebAssembly, Electron, & Modern Frameworks
Modern CS in the Real World

Thoughts pushing software forward, including consensus, CRDT's, formal methods, & probabilistic programming
Modern Operating Systems

Applied, practical, & real-world deep-dive into industry adoption of OS, containers and virtualization, including Linux on Windows, LinuxKit, and Unikernels
Optimizing You: Human Skills for Individuals

Better teams start with a better self. Learn practical skills for IC

Tuesday, 6 November

Architectures You've Always Wondered About

Next-gen architectures from the most admired companies in software, such as Netflix, Google, Facebook, Twitter, & more
21st Century Languages

Lessons learned from languages like Rust, Go-lang, Swift, Kotlin, and more.
Emerging Trends in Data Engineering

Showcasing DataEng tech and highlighting the strengths of each in real-world applications.
Bare Knuckle Performance

Killing latency and getting the most out of your hardware
Socially Conscious Software

Building socially responsible software that protects users privacy & safety
Delivering on the Promise of Containers

Runtime containers, libraries, and services that power microservices

Wednesday, 7 November

Applied AI & Machine Learning

Applied machine learning lessons for SWEs, including tech around TensorFlow, TPUs, Keras, PyTorch, & more
Production Readiness: Building Resilient Systems

More than just building software, building deployable production ready software
Developer Experience: Level up your Engineering Effectiveness

Improving the end to end developer experience - design, dev, test, deploy, operate/understand.
Security: Lessons Attacking & Defending

Security from the defender's AND the attacker's point of view
Future of Human Computer Interaction

IoT, voice, mobile: Interfaces pushing the boundary of what we consider to be the interface
Enterprise Languages

Workhorse languages found in modern enterprises. Expect Java, .NET, & Node in this track

This Year's Schedule

The all-new QCon app!

Available on iOS and Android

The new QCon app helps you make the most of your conference experience. Easily browse and follow the conference schedule, star the talks you want to attend, and keep tabs on your personal itinerary. Download the app now for free on iOS and Android.

Track: Emerging Trends in Data Engineering

Location: Bayview AB

Duration: 5:25pm - 6:15pm

Day of week: Tuesday

Level: Intermediate

Persona: Architect, Backend Developer, Data Engineering, Developer, General Software

Abstract

Speaker: Joy Gao

Find Joy Gao at

Similar Talks

Tracks

Monday, 5 November

Microservices / Serverless Patterns & Practices

Practices of DevOps & Lean Thinking

JavaScript & Web Tech

Modern CS in the Real World

Modern Operating Systems

Optimizing You: Human Skills for Individuals

Tuesday, 6 November

Architectures You've Always Wondered About

21st Century Languages

Emerging Trends in Data Engineering

Bare Knuckle Performance

Socially Conscious Software

Delivering on the Promise of Containers

Wednesday, 7 November

Applied AI & Machine Learning

Production Readiness: Building Resilient Systems

Developer Experience: Level up your Engineering Effectiveness

Security: Lessons Attacking & Defending

Future of Human Computer Interaction

Enterprise Languages

The all-new QCon app!

Available on iOS and Android

Presentation: The Whys and Hows of Database Streaming

Track: Emerging Trends in Data Engineering

Location: Bayview AB

Duration: 5:25pm - 6:15pm

Day of week: Tuesday

Level: Intermediate

Persona: Architect, Backend Developer, Data Engineering, Developer, General Software

More talks on:

Share this on:

Abstract

Speaker: Joy Gao

Find Joy Gao at

Similar Talks

Tracks

Monday, 5 November

Tuesday, 6 November

Wednesday, 7 November

The all-new QCon app!

Available on iOS and Android