Presentation: High throughput stream processing with ACID guarantees

Distributed realtime event processing, also called stream processing, gained a lot of traction in the Big Data community recently. Frameworks like Apache Storm and Apache Samza tackle various challenges in stream processing including performance, scalability, reliability and fault tolerance. However, very limited support on transactions are available in these frameworks. The lack of complete transaction support puts burden on developers to maintain data integrity by implementing rollback partial writes in case of failures and designing for read isolation of uncommitted writes.

We designed and implemented a new distributed stream processing framework that has built-in transaction support with full ACID properties while not compromising on scalability, reliability and fault tolerance of Storm and Samza. We use the Apache Twill library on Hadoop YARN as the execution environment to provide elastic scalability, fault tolerance, and logs & metrics collection. The transaction support guarantees stream events to have exactly-once processing semantics, together with any data operations performed with full ACID properties. This new framework allows developers to focus more on the application logic rather than dealing with complicated distributed systems and distributed transactions.

Tracks

Covering innovative topics

Monday, 3 November

Tuesday, 4 November

Wednesday, 5 November

Conference for Professional Software Developers