Abstract

People used to ask me all the time how to figure out if their chaos test has “passed,” and I’d always say “well, that’s a loaded question.” To confirm that a chaos test “passed,” we need to do verification of hypotheses - sometimes you’re trying to prove some system behavior occurred in response to a stimulus, while other times you’re trying to prove the absence of a change in system behavior. Take this already nebulous concept, and now think about making it generic enough that the core validation logic can be re-used by any engineer running any kind of experiment on any one of our products. Then, try to do all of this in a complex distributed technical environment where it’s hard enough just to determine whether an application was healthy in the first place! That’s exactly the problem that the chaos engineering team at Vanguard has been tackling with the recent addition of automated assertions to the internal chaos tooling. In this talk, you’ll learn about when it’s appropriate to define “pass” and “fail” for a chaos experiment, and when it might not be, and you’ll get to take a peek under the hood at the way that Vanguard engineers are automatically verifying their hypotheses in the context of chaos experiments.

Interview:

What's the focus of your work these days?

I actually just started a new role now operating in more of an architect capacity than where I previously was very narrowly scoped, Site Reliability Engineering. So focusing on architecture that supports all of our developer experience platform and enablement of software engineering excellence across our entire I.T. organization.

Can you tell me what the motivation behind your talk is?

In this talk I hope to build on some of what I've shared about Vanguard's chaos engineering strategy in some prior talks, and talk a little bit about the level of maturity that we've reached now where we're not just experimenting in an exploratory way - but using the results of our Chaos experiments to make some assertions about the reliability of our systems and hopefully make sure that others understand how to do the same with their Chaos experiments as well.

How would you describe the persona and level of the target audience for this session?

I think that any technician will take a lot away from this talk, especially anyone who works in large enterprises, because that's the environment that I'm working in at Vanguard. Anyone who has some experience running chaos experiments in the past or has an interest in running chaos experiments in their organizations. So anyone with a site reliability engineering background, or just some experience with chaos experimentation will really enjoy this talk.

Is there anything specific that you would like these folks to walk away with after watching your presentation?

When they walk away from the presentation they'll certainly have a feel for the architecture and what we've built if they want to do something similar in their own organizations. But I don't expect that that's what most will take away. I hope primarily that people will walk away with the idea to put some assertions around some of their chaos experiments, also to do some exploratory testing without assertions and to determine when is the right time to do each of those things.

Speaker

Christina Yakomin

Senior Site Reliability Engineering Specialist @Vanguard_Group

Christina is a Senior Site Reliability Engineering Specialist in Vanguard's Chief Technology Office. She has worked at the company's Malvern, PA headquarters since graduating from Villanova University with an undergraduate degree in Computer Science. Throughout her career, she has developed an expansive skill set in front- and back-end web development, as well as cloud infrastructure and automation, with a specialization in Site Reliability Engineering. She has earned several Amazon Web Services certifications, including the Solutions Architect - Professional. Christina has also worked closely with the Women's Initiative for Leadership Success at Vanguard, both internally at the company and externally in the local community, to further the career advancement of women and girls - in particular within the tech industry. In her spare time (and when it is safe to do so!), Christina is passionate about traveling; she has visited over 20 different countries and 25 U.S. states so far!

Did the Chaos Test Pass?

Abstract

Interview:

What's the focus of your work these days?

Can you tell me what the motivation behind your talk is?

How would you describe the persona and level of the target audience for this session?

Is there anything specific that you would like these folks to walk away with after watching your presentation?

Speaker

Christina Yakomin

Find Christina Yakomin at:

Speaker

Christina Yakomin

Date

Location

Track

Topics

Share

From the same track

The Endgame of SRE

Rethinking Reliability: What You Can (and Can't) Learn From Incidents

The Eternal Sunshine of the Toil-Less Prod

[Panel] SRE: Is it Working?

Follow QCon

Contact

Menu

Conferences around the World