Zero-Downtime Deployments


Created by Alexander Klizhenas / @klizhentas

About me


Founding engineer @Mailgun by Rackspace

Voice and messaging systems since 2006

Focusing on infrastructure

About Mailgun: Overivew


Mailgun — Email Service for Developers

Solving hard problems in the email space

No scheduled maintenance windows

About Mailgun: How it works


About Mailgun: Open Source


github.com/mailgun/flanker

github.com/mailgun/talon

github.com/mailgun/vulcand

This Talk: Overview


→Evolution of the Mailgun cluster

New tools

Dynamic load balancing

Cluster: Start


Cluster: Redundancy


Cluster: Pods


Cluster: Message Queues


Cluster: Services


Cluster: Overview


Cluster: Resume


We have a lots of fast-moving parts

This Talk: Overview


Evolution of the Mailgun cluster

→ New tools

Dynamic load balancing

Discovery: HA


CoreOS: Etcd


Etcd: Consistent Feed


Etcd: Consistent Feed


Discovery Challenges: Connections


Discovery Challenges: Connections


Discovery Challenges: Connections


Discovery Challenges: Multicast


Discovery Challenges: Multicast


Discovery Challenges: Multicast


Discovery Challenges: Replicated State


Discovery Challenges: Replicated State


Discovery Challenges: Replicated State


Cluster: Resume


Dynamic configuration introduces new challenges

This Talk: Overview


Evolution of the Mailgun cluster

New tools

→ Dynamic load balancing

Dynamic LB: Reliable socket


Dynamic LB: Reliable socket


Dynamic LB: Reliable socket


Dynamic LB: Reliable socket


Dynamic LB: Reliable socket


Dynamic LB: Reload


Dynamic LB: Reload


Dynamic LB: Reload


Dynamic LB: Reload


Dynamic LB: Vulcand


Vulcand: Reload everything


Middlewares

Connection limits

Frontends and backends

Rate limits

Vulcand: Routing


Vulcand: Routing


Vulcand: Trie


Vulcand: Automatic registration

                      
@api("/v2/domains", methods=['GET'])
def index():
    # ...

@api("/v2/domains", methods=['POST'])
def create():
    # ...
                      
                    

Vulcanctl


Vulcand: Cooperation


Vulcand: Cooperation


Vulcand: Cooperation


Vulcand: Cooperation

                      
GET http://vulcan1.host/v1/upstreams/u1/endpoints

{
   "Endpoints":[
      {
         "Id":"e1",
         "Url":"http://e1.host:5000",
         "UpstreamId":"u1",
         "Stats":{
            "Counters":{
               "Period":10000000000,
               "NetErrors":3,
               "Total":6,
               ...
            },
                      
                    

Dynamic LB: Retries


Dynamic LB: Retries


Dynamic LB: Retries


Dynamic LB: Canary Deployments


Dynamic LB: Canary Deployments


Dynamic LB: Canary Deployments


Vulcand: Canary Deployments

                      
GET http://vulcan1.host/v1/upstreams/u1/endpoints

{
   "Verdict":{
      "IsBad":true,
      "Anomalies":[
         {
            "Code":1,
            "Message":"50.00 quantile latency stands out"
         }
      ]
   }
}
                      
                    

Dynamic LB: Circuit Breakers


Dynamic LB: Circuit Breakers


Vulcand: Circuit Breakers

                      
PUT http://vulcan1.host/v1/middlewares/cbreaker/cb1

{
   "Middleware":{
      "Condition":"LatencyAtQuantileMS(50.0) > 50",
      "Fallback":{
         "Type":"response",
         "Action":{
            "StatusCode":400,
            "Body":"Come back later"
         }
      }
   }
}
                      
                    

www.vulcanproxy.com


github.com/mailgun/vulcand

Questions?