A streaming pipeline is where we send data from a source to a target as the data happens, in a stream. Contrast this to a batch world where we wait for a period of time (maybe hours or days) before collecting a bunch of the data and then sending it to the target.
There are several good reasons for wanting to use a streaming pipeline, including:
- Ensuring more accurate data in the target system
- Reacting to data as it changes, while it is current and relevant
- Spreading the processing load and avoiding resource shortages from a huge influx of data
In the context of Apache Kafka, a streaming data pipeline means ingesting the data from sources into Kafka as it's created and then streaming that data from Kafka to one or more targets. In this example, we'll cover everything starting from basics, creating producers and consumers, working with connectors, and moving data downstream.
Senior Developer Advocate @Confluent
A mathematician by training and software engineer in practice, Danica is passionate about problem-solving. Her work has spanned multiple disciplines -- biology, finance, and geology -- where she may apply her existing skillset to new areas and use the opportunities to expand her knowledge and grow her skills. She has proven herself as a capable engineer who thrives in new environments and rises to meet any challenge.
As a certified Scrum Master, Danica enjoys working with people to manage projects and facilitate the organization of new, improved workflows.