Chaos Engineering – Reliability through failure
Sooner or later, all systems fail. Instead of trying to create the perfect system which will never fail, we should embrace this failure and gracefully degrade our service. Especially when it comes to distributed services. By forcefully causing outages and delays, attempting to tear down our own services, we can prepare for this impact once it happens in the real world.
We’ll look into what chaos engineering is and the philosophy it tends to convey. Which assumptions developers make and how chaos engineering tends to avoid the problems that stem from these fallacies.
As well as what it is, we’ll look at some practical examples and tools that are out there to follow this testing principle. We’ll see how Netflix uses this paradigm to keep their services resilient and how they have proven to be better prepared on serious outages.
Chaos engineering is not for the faint of heart. It is effectively trying to make your live production environment unavailable. But what could give better confidence in your system than having it break on a daily basis but degrade gracefully in the process?
SNB / all
24 november 2022
13:30 - 15:30