Samza: Real-time Stream Processing at LinkedIn
Track: Hadoop : Beyond Map-ReduceLocation:Grand Ballroom AAbstract:
Apache Samza is a distributed stream processing framework. Samza provides a familiar and easy to use MapReduce style API that allows developers to process messages and events in realtime. The framework integrates with Apache Kafka for its messaging layer, and Apache Hadoop YARN to manage fault tolerance, processor isolation, resource management, and security. Samza also manages processor state, and will recover to a consistent snapshot when failures occur. This talk will cover Samza's feature set, how Samza integrates with YARN and Kafka, how it's used at LinkedIn, and what's next on the roadmap.