Top 10 Performance Gotchas in scaling in-memory Algorithms

Top 10 Performance Gotchas in scaling in-memory Algorithms

Wednesday, 11:45am - 12:35pm

Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ever growing demand for sophisticated algorithms over big data. In this talk, we get a ringside view in the making of the world's most scalable and fastest machine learning framework, H2O, and the performance lessons learnt scaling it over EC2 for Netflix and over commodity hardware for other power users.


Top 10 Performance Gotchas is about the white hot stories of i/o wars, S3 resets, and muxers, as well as the power of primitive byte arrays, non-blocking structures, and fork/join queues. Of good data distribution & fine-grain decomposition of Algorithms to fine-grain blocks of parallel computation. It's a 10-point story of the rage of a network of machines against the tyranny of Amdahl while keeping the statistical properties of the data and accuracy of the algorithm.

SriSatish.Ambati's picture
Sri is co-founder and ceo of 0xdata (@hexadata), the makers of H2O. H2O brings better in-memory math algorithms to big data and democratizes data science through Open Source. Before 0xdata, Sri spent time scaling R over big data with researchers at Purdue and Stanford. Prior to that Sri co-founded Platfora and was the Director of Engineering at DataStax, the NoSQL Cassandra Company. Before that Sri lead performance & partner engineering at the java multi-core startup, Azul Systems, tinkering with the entire ecosystem of enterprise apps for performance and scale. Before that Sri was at sabbatical pursuing Theoretical Neuroscience at Berkeley. Prior to that Sri worked on high performance nosql trie based index for semistructured data at in-memory index startup RightOrder. Sri is known for his knack for envisioning killer apps in fast evolving spaces and assembling stellar teams towards productizing that vision. A regular speaker in the Big Data, NoSQL and Java circuit, Sri leaves trail @srisatish.