become our partner

The butterfly effect in SRE

The Program Committee has not yet taken a decision on this talk

Photo
Max Vanyushkin

Tinkoff

Abstracts

Strong SLA tends to be a requirement in the modern digital world, and it's actually required in fintech like companies. But some things are not important at the first look could dramatically breach SLA, especially when we deal with highly loaded services.
Our team works on an observability platform we created at Tinkoff and named "Sage". Sage is an internal product and covers the whole ecosystem of the company. Sage is a pretty loaded system and gets 4 Gigabytes/s of incoming traffic and holds 7.5 Petabytes of user's data.
In my talk, I'm going to share with you experience of overcoming several failures (hardware and software) we got operating that system due to small things we didn't pay enough attention.