Streaming Databases: Embracing the Convergence of Stream Processing and Databases

Streaming databases have gained significant attention in recent years. From its name, it is evident that a streaming database combines the power of stream processing and databases. The fundamental concept behind streaming databases is to provide efficient data storage capabilities to facilitate seamless stream processing.

In this presentation, I will delve into the evolution of streaming databases over the past two decades. Furthermore, I will highlight the distinctive features and design principles that set streaming databases apart from conventional database systems and stream processing engines. Through the use of real-world scenarios, I will demonstrate how streaming databases can simplify the data stack and enhance cost efficiency when building real-time applications, such as network monitoring, ad recommendation, and crypto trading.

What's the focus of your work these days?

I'm the founder of RisingWave Labs, the company that develops a distributed SQL streaming database called RisingWave. Our goal is to make stream processing simpler and more cost efficient.

What's the motivation for your talk at QCon San Francisco 2023?

Streaming databases are gaining popularity these days, but many people wonder how they differ from conventional stream processing systems and real-time OLAP databases. Starting with the basic concept, I will delve into technical details and use cases, explaining why people need streaming databases in their modern data stack.

How would you describe your main persona and target audience for this session?

My primary audience consists of data engineers or engineering leaders seeking modern technologies for real-time analytics and stream processing.

Is there anything specific that you'd like people to walk away with after watching your session?

I aim to provide my audience with a comprehensive understanding of the modern real-time data processing ecosystem, empowering them to make informed decisions when choosing among various systems.


Speaker

Yingjun Wu

Founder and CEO @RisingWave Labs, Previously Engineer @AWS Redshift & Researcher @IBM Research Almaden

Yingjun Wu is the founder of RisingWave Labs, a database company developing RisingWave, a distributed SQL database for stream processing. Before running the company, Yingjun was a software engineer at the Redshift team, Amazon Web Services, and a researcher at the Database group, IBM Almaden Research Center. Yingjun received his PhD degree from National University of Singapore, and was a visiting PhD at Carnegie Mellon University. He has been working in the field of stream processing and database systems for over a decade.

Read more
Find Yingjun Wu at:

Date

Monday Oct 2 / 01:35PM PDT ( 50 minutes )

Location

Ballroom A

Topics

Stream Processing Database Cloud Real-Time Analytics

Share

From the same track

Session Graph Databases

LIquid: A Large-Scale Relational Graph Database

Monday Oct 2 / 10:35AM PDT

We describe LIquid(1 2), the graph database built to host LinkedIn.

Speaker image - Scott Meyer
Scott Meyer

Distinguished Software Engineer @LinkedIn, Creator of the Graph Database, LIquid, Metaweb/freebase Alum

Session Distributed Systems

Redesigning OLTP for a New Order of Magnitude

Monday Oct 2 / 02:45PM PDT

The world is becoming more transactional. From colocation and server rental to serverless and usage-based billing. From coal to clean energy and smart meters that arbitrage solar prices 1440 times a month instead of monthly. Not to mention FedNow or the tsunami of instant payments.

Speaker image - Joran Greef
Joran Greef

Founder and CEO @TigerBeetle

Session Data Lakes

Incremental Data Processing with Apache Hudi

Monday Oct 2 / 03:55PM PDT

Incremental Data Processing is an emerging style of data processing gathering attention recently that has the potential to deliver orders of magnitude speed and efficiency over traditional batch processing on data lakes and data warehouses.

Speaker image - Saketh Chintapalli
Saketh Chintapalli

Software Engineer @Uber, Bringing Incremental Data Processing to Data Warehouse Models

Speaker image - Bhavani Sudha Saktheeswaran
Bhavani Sudha Saktheeswaran

Distributed Systems Engineer @Onehouse, Apache Hudi PMC, Ex-Moveworks, Ex-Uber, Ex-Linkedin

Session Architecture

Sleeping at Scale - Delivering 10k Timers per Second per Node with Rust, Tokio, Kafka, and Scylla

Monday Oct 2 / 05:05PM PDT

As a part of OneSignal’s no-code Journeys system, we knew that we would need a way to store billions of timers.

Speaker image - Lily Mara
Lily Mara

Engineering Manager @OneSignal, Author of "Refactoring to Rust"

Speaker image - Hunter Laine
Hunter Laine

Software Engineer @OneSignal

Session Data

PRQL: A Simple, Powerful, Pipelined SQL Replacement

Monday Oct 2 / 11:45AM PDT

Most databases use SQL as the interface to access relational data. Because of that, we associate SQL to be the language of relational algebra. But its affinity with the English language and unclear and inconsistent semantics leave a lot of space for improvements.

Speaker image - Aljaž Mur Eržen
Aljaž Mur Eržen

Compiler Developer @EdgeDB & PRQL Maintainer