Warning message

The service having id "twitter" is missing, reactivate its module or save again the list of services.
The service having id "facebook" is missing, reactivate its module or save again the list of services.
The service having id "google_plus" is missing, reactivate its module or save again the list of services.
The service having id "linkedin" is missing, reactivate its module or save again the list of services.

Location:

Ballroom B/C

Day of week:

Tuesday

"Failure is the key to success; each mistake teaches us something" (Morehei Ueshiba) Failure is inevitable. Complex systems are fragile by nature. What approaches can you leverage to build fault tolerant systems? Come learn from leaders in the fields — those who build and maintain complex systems at web scale companies.

Track Host:

Sudhir Tonse

Cloud Pioneer managing Realtime Data @Uber

Sudhir Tonse manages the Realtime Data Intelligence team at Uber. Previously Sudhir managed the Cloud PLATFORM Infrastructure team at Netflix and was responsible for many of the services and components that form the Netflix Cloud Platform as a Service. Many of these components have been open sourced under the NetflixOSS umbrella. Open source contribution includes Archaius: a dynamic configuration/properties management library, Ribbon: an Inter Process Communications framework that includes Cloud friendly Software load balancers, Karyon: the nucleus of a PaaS service etc. Prior to Netflix, Sudhir was an Architect at Netscape/AOL delivering large-scale consumer and enterprise applications in the area of Personalization, Infrastructure and Advertising Solutions. Sudhir is a weekend golfer and tries to make the most of the wonderful California weather and public courses.

10:35am - 11:25am

by Yongsheng Wu
Engineering Manager @Pinterest

Building Highly-resilient Systems at Pinterest

In this talk, Yongsheng from Pinterest is going to talk about how to build highly-resilient systems at scale. His talk will cover 5 highly fault-tolerant, battle-tested systems: dynamic service discovery, real-time configuration management, caching, persistent storage, and event processing pipeline. He will also cover failure cases that prompted engineers at Pinterest to build such systems, and how they actually test these systems to make sure that they can gracefully handle those failure...

11:50am - 12:40pm

Open Space

Architecting for Failure Open Space

1:40pm - 2:30pm

by Fangjin Yang
Co-Founder @Imply

Architecting Distributed Databases for Failure

Running distributed systems in production can be tremendously challenging. In this session, we will cover common problems and failures seen with distributed systems, and discuss design patterns that can used to maintain data integrity and availability when everything goes wrong. We will use Druid as a real world case study of how these patterns are implemented in an open source technology.

Attendees will learn first hand about the multitude of software, hardware, network, and data...

2:55pm - 3:45pm

by Amos Barreto
Director of Marketplace Engineering @Uber

Architecting for failure induced by human errors

Humans are most often the SPOF of a distributed system. According to Gartner, “Through 2015, 80% of outages impacting mission-critical services will be caused by people and process issues, and more than 50% of those outages will be caused by change/configuration/release integration and hand-off issues”.

This talk will walk the attendees on how Uber's architecture continues to evolve by providing clear isolation, designing gradual release strategies, enabling fast detection and...

4:10pm - 5:00pm

by Nitesh Kant
Senior Software Engineer @Netflix

Crossroads of asynchrony and graceful degradation

Netflix with more than 60 million subscribers worldwide and accounting for a third of the internet traffic in the United States, is a highly available internet service. In order to guarantee high availability for our service, we have architected our systems in a way that different failures modes in distributed systems causes graceful degradation and not unavailability.

In our constant endeavor to improve availability of our services, we are on our path to embrace asynchrony in its...

5:25pm - 6:15pm

by Bhakti Mehta
Senior Software Engineer @BlueJeansNetwork

Resilience planning & how the empire strikes back

It is well said that "The more you sweat on the field, the less you bleed in war". Failures are an inevitable part of complex systems. Accepting that failures happen, will help you design the system's reactions to specific failures.

This talks on best practices for building resilient, stable and predictable services: preventing cascading failures, timeouts pattern, retry pattern,circuit breakers and other techniques which have been pervasively used at Blue Jeans Network. Join me in...

Tracks

Covering innovative topics

Monday Nov 16

Architectures You've Always Wondered About

Silicon Valley to Beijing: Exploring some of the world's most intrigiuing architectures
Applied Machine Learning

How to start using machine learning and data science in your environment today. Latest and greatest best practices.
Browser as a platform (Realizing HTML5)

Exciting new standards like Service Workers, Push Notifications, and WebRTC are making the browser a formidable platform.
Modern Languages in Practice

The rise of 21st century languages: Go, Rust, Swift
Org Hacking

Our most innovative companies reimagining the org structure
Design Thinking

Level up your approach to problem solving and leave everything better than you found it.

Tuesday Nov 17

Containers in Practice

Build resilient, reactive systems one service at a time.
Architecting for Failure

Your system will fail. Take control before it takes you with it.
Modern CS in the Real World

Real-world Industry adoption of modern CS ideas
The Amazing Potential of .NET Open Source

From language design in the open to Rx.NET, there is amazing potential in an Open Source .NET
Optimizing You

Keeping life in balance is always a challenge. Learning lifehacks
Unlearning Performance Myths

Lessons on the reality of performance, scale, and security

Wednesday Nov 18

Streaming Data @ Scale

Real-time insights at Cloud Scale & the technologies that make them happen!
Taking Java to the Next Level

Modern, lean Java. Focuses on topics that push Java beyond how you currently think about it.
The Dark Side of Security

Lessons from your enemies
Taming Distributed Architecture

Reactive architectures, CAP, CRDTs, consensus systems in practice
JavaScript Everywhere!

Javascript is Everywhere. Learn why
Culture Reimagined

Lessons on building highly effective organizations

Schedule

Warning message

Location:

Day of week:

Tracks

Covering innovative topics

Monday Nov 16

Tuesday Nov 17

Wednesday Nov 18

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Warning message

Track: Architecting for Failure

Location:

Day of week:

Tracks

Covering innovative topics

Monday Nov 16

Tuesday Nov 17

Wednesday Nov 18

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World