You are viewing content from a past/completed QCon

Presentation: Practical Change Data Streaming Use Cases With Apache Kafka & Debezium

Track: Modern Data Architectures

Location: Ballroom BC

Duration: 5:25pm - 6:15pm

Day of week: Tuesday

Slides: Download Slides

Share this on:

This presentation is now available to view on InfoQ.com

Watch video with transcript

What You’ll Learn

  1. Hear about change data capture (CDC) and project Debezium.
  2. Find out about the use cases of CDC.
  3. Learn about the outbox pattern.

Abstract

Debezium (noun | de·be·zi·um | /dɪ:ˈbɪ:ziːəm/) - Secret Sauce for Change Data Capture

Apache Kafka is a highly popular option for asynchronous event propagation between microservices. Things get challenging though when adding a service’s database to the picture: How can you avoid inconsistencies between Kafka and the database?

Enter change data capture (CDC) and Debezium. By capturing changes from the log files of the database, Debezium gives you both reliable and consistent inter-service messaging via Kafka and instant read-your-own-write semantics for services themselves.

In this session you’ll see how to leverage CDC for reliable microservices integration, e.g. using the outbox pattern, as well as many other CDC applications, such as maintaining audit logs, automatically keeping your full-text search index in sync, and driving streaming queries. We’ll also discuss practical matters, e.g. HA set-ups, best practices for running Debezium in production on and off Kubernetes, and the many use cases enabled by Kafka Connect's single message transformations.

Question: 

What is the work you're doing today?

Answer: 

I work as a software engineer at Red Hat and there I'm the lead of the Debezium project, which is a tool for change data capture.

Question: 

What are the goals you have to talk?

Answer: 

I would like to first and foremost familiarize people with the concepts and ideas of change data capture. What is this about? But most importantly, what are the use cases for it? Why would you like to use change data capture? There are many use cases like data replication, data exchange between microservices, you could use this to enable streaming queries or auditing. I would like to familiarize people with those concepts. It's liberation for your data. You have data sitting there in a database and CDC allows to react to changes in the data. This liberation of data, that's what I would like to talk about.

Question: 

Could you briefly describe what the outbox pattern is?

Answer: 

The idea there is that very often people have this requirement that they need to update multiple things from within their application. Let's say they need to process a purchase order. But at the same time, they would also like to update the search index or they would like to send a message to Kafka to notify any downstream consumers about the order. Typically those two things, the database and Kafka, they cannot be updated atomically within one global transaction. If you've tried to do this, you're bound to fail and you will end up with inconsistencies. This outbox pattern essentially is a way to avoid this. How it works? You don't only update  the business tables in your database, but within the same transaction you also insert an event record into the outbox table. You then capture the insert from the outbox table and stream these change events to downstream consumers.

Question: 

It's a two step transaction?

Answer: 

In the end, you could say it's that. It essentially allows you to have this instant read your own writes for your own changes. You could go to the database and you would see this newly persisted purchase order. But then at the same time, it also gives you eventually consistent eventing to downstream consumers.

Question: 

What advantages does having an event bus like Kafka in the middle of that flow give you?

Answer: 

First of all, there is this notion of decoupling. This all will be asynchronous. Even if you cannot reach Kafka for some time, eventually you’ll  be able to send events to Kafka again, with the source application not being impacted by any downtime. Also, if any consumers of those events are not available, let's say our search index, we cannot access it for some reason, we are not bothered by that because Kafka's it's in there and decouples them. Then one of the things which I really like about Kafka is that it's like a durable log. We can keep change events in Kafka topics for as long as we want, and you could reread topics from the beginning. This means we could add a new consumer down the road long after those change events were produced. For instance, we could add a consumer which takes the data and writes it to a data warehouse. And maybe we didn't even think about this use case when we were producing those events originally.

Question: 

What do you want people to leave the talk with?

Answer: 

Three things, mostly. One touches a bit on the outbox pattern. Friends don't let friends do dual writes. That's one of the points I would like to talk. The next one, people should get an understanding what it is in there for them if they would use change data capture. What are the use cases? How could they benefit in their jobs by using this? And then finally, I would like to run them through some practical matters. How could you use this on things like Kubernetes or what are typical topologies? Sometimes people would like to stream changes for secondary database in the cluster. Those practical matters.

Speaker: Gunnar Morling

Open Source Software Engineer @RedHat

Gunnar Morling is a software engineer and open-source enthusiast by heart. He is leading the Debezium project, a tool for change data capture (CDC). He is a Java Champion, the spec lead for Bean Validation 2.0 (JSR 380) and has founded multiple open source projects such as Deptective and MapStruct. Prior to joining Red Hat, Gunnar worked on a wide range of Java EE projects in the logistics and retail industries. He's based in Hamburg, Germany.

Find Gunnar Morling at

2020 Tracks

  • Non-Technical Skills for Technical Folks

    To be an effective engineer, requires more than great coding skills. Learn the subtle arts of the tech lead, including empathy, communication, and organization.

  • Clientside: From WASM to Browser Applications

    Dive into some of the technologies that can be leveraged to ultimately deliver a more impactful interaction between the user and client.

  • Languages of Infra

    More than just Infrastructure as a Service, today we have libraries, languages, and platforms that help us define our infra. Languages of Infra explore languages and libraries being used today to build modern cloud native architectures.

  • Mechanical Sympathy: The Software/Hardware Divide

    Understanding the Hardware Makes You a Better Developer

  • Paths to Production: Deployment Pipelines as a Competitive Advantage

    Deployment pipelines allow us to push to production at ever increasing volume. Paths to production looks at how some of software's most well known shops continuous deliver code.

  • Java, The Platform

    Mobile, Micro, Modular: The platform continues to evolve and change. Discover how the platform continues to drive us forward.

  • Security for Engineers

    How to build secure, yet usable, systems from the engineer's perspective.

  • Modern Data Engineering

    The innovations necessary to build towards a fully automated decentralized data warehouse.

  • Machine Learning for the Software Engineer

    AI and machine learning are more approachable than ever. Discover how ML, deep learning, and other modern approaches are being used in practice by Software Engineers.

  • Inclusion & Diversity in Tech

    The road map to an inclusive and diverse tech organization. *Diversity & Inclusion defined as the inclusion of all individuals in an within tech, regardless of gender, religion, ethnicity, race, age, sexual orientation, and physical or mental fitness.

  • Architectures You've Always Wondered About

    How do they do it? In QCon's marquee Architectures track, we learn what it takes to operate at large scale from well-known names in our industry. You will take away hard-earned architectural lessons on scalability, reliability, throughput, and performance.

  • Architecting for Confidence: Building Resilient Systems

    Your system will fail. Build systems with the confidence to know when they do and you won’t.

  • Remotely Productive: Remote Teams & Software

    More and more companies are moving to remote work. How do you build, work on, and lead teams remotely?

  • Operating Microservices

    Building and operating distributed systems is hard, and microservices are no different. Learn strategies for not just building a service but operating them at scale.

  • Distributed Systems for Developers

    Computer science in practice. An applied track that fuses together the human side of computer science with the technical choices that are made along the way

  • The Future of APIs

    Web-based API continue to evolve. The track provides the what, how, and why of future APIs, including GraphQL, Backend for Frontend, gRPC, & ReST

  • Resurgence of Functional Programming

    What was once a paradigm shift in how we thought of programming languages is now main stream in nearly all modern languages. Hear how software shops are infusing concepts like pure functions and immutablity into their architectures and design choices.

  • Social Responsibility: Implications of Building Modern Software

    Software has an ever increasing impact on individuals and society. Understanding these implications helps build software that works for all users