Presentation: Fix Spark Failures and Bottlenecks Faster & Easier

Track: Sponsored Solutions Track I

Location: Pacific BC

Duration: 4:10pm - 5:00pm

Day of week: Monday

Level: Intermediate

Persona: Architect, Data Engineering

Share this on:

Abstract

This talk presents the results of analyzing many Spark jobs on many multi-tenant production clusters. Kirk discusses common issues seen, the symptoms of those issues, and how developers can address them.

At Pepperdata, we have gathered trillions of performance data points on production clusters running Spark, covering a variety of industries, applications, and workload types. We will present key performance insights — best and worst practices, gotchas, and tuning recommendations — based on analyzing the behavior and performance of millions of Spark applications. In addition, we will describe how we are turning these learnings into heuristics used in the open source Dr. Elephant project.

Speaker: Kirk Lewis

Field Engineer @Pepperdata

Kirk Lewis joined Pepperdata in 2015. Previously, Kirk was a Solutions Engineer at StackVelocity. Before that he was the lead technical architect for big data production platforms at American Express. Kirk has a strong background in big data.

Find Kirk Lewis at

Similar Talks

STSM, IBM Streams Programming Model Architect
Original Developer @ApacheCalcite, Co-Founder SQLstream, & Architect @Hortonworks
Engineer @Google & Founder/Committer on Apache Beam
Initial Author of Apache Spark SQL & Leads Streaming Team @Databricks
Committer @ApacheFlink, CTO @dataArtisans
Committer @ApacheFlink, CTO @dataArtisans
Engineer @Google & Founder/Committer on Apache Beam

.

Tracks

  • Architectures You've Always Wondered About

    Architectural practices from the world's most well-known properties, featuring startups, massive scale, evolving architectures, and software tools used by nearly all of us.

  • Going Serverless

    Learn about the state of Serverless & how to successfully leverage it! Lessons learned in the track hit on security, scalability, IoT, and offer warnings to watch out for.

  • Microservices: Patterns and Practices

    Stories of success and failure building modern Microservices, including event sourcing, reactive, decomposition, & more.

  • DevOps: You Build It, You Run It

    Pushing DevOps beyond adoption into cultural change. Hear about designing resilience, managing alerting, CI/CD lessons, & security. Features lessons from open source, Linkedin, Netflix, Financial Times, & more. 

  • The Art of Chaos Engineering

    Failure is going to happen - Are you ready? Chaos engineering is an emerging discipline - What is the state of the art?

  • The Whole Engineer

    Success as an engineer is more than writing code. Hear inward looking thoughts on inclusion, attitude, leadership, remote working, and not becoming the brilliant jerk.

  • Evolving Java

    Java continues to evolve & change. Track covers Spring 5, async, Kotlin, serverless, the 6-month cadence plans, & AI/ML use cases.

  • Security: Attacking and Defending

    Offense and defensive security evolution that application developers should know about including SGX Enclaves, effects of AI, software exploitation techniques, & crowd defense

  • The Practice & Frontiers of AI

    Learn about machine learning in practice and on the horizon. Learn about ML at Quora, Uber's Michelangelo, ML workflow with Netflix Meson and topics on Bots, Conversational interfaces, automation, and deployment practices in the space.

  • 21st Century Languages

    Compile to Native, Microservices, Machine learning... tailor-made languages solving modern challenges, featuring use cases around Go, Rust, C#, and Elm.

  • Modern CS in the Real World

    Applied trends in Computer Science that are likely to affect Software Engineers today. Topics include category theory, crypto, CRDT's, logic-based automated reasoning, and more.

  • Stream Processing In The Modern Age

    Compelling applications of stream processing using Flink, Beam, Spark, Strymon & recent advances in the field, including Custom Windowing, Stateful Streaming, SQL over Streams.  

Conference for Professional Software Developers