Moving beyond prototypes: Building resilience at scale in your IoT application

Alina Dima

Amazon Web Services

15 December, 14:40, «02 Hall. Ararat»


If 1% of your 100-device fleet goes offline, it’s 1 device. Maybe your use-case can live with that. But can your use-case be bulletproof with 1% of 10 million devices (100000) going offline? If the answer is no, then it is time to learn about resilience at scale.

At scale, the overall health of your IoT application (Edge and Cloud) can be affected by events outside of your control. Here are some examples: the network provider drops the connection, or a high % of your fleet comes online at the same time. In the IoT problem space it is your responsibility as an engineer to handle not only the resilience of an Edge application instance and its interaction with the Cloud, but also the collective resilience of 10s of 1000s of Edge application instances connecting and communicating with your Cloud application, and the impact of them all performing or not performing an action simultaneously. Problems that seem minor at small scale, such as: 1% of your fleet going offline and coming online at the same time, might become major at large scale. Is your IoT application safe from a self-inflicted DDos attack?

This session will focus on explaining resilience at scale, and how scale uncovers problems you don’t see otherwise, and providing examples of mitigation strategies you can build with AWS IoT, to ensure that your IoT application in its entirety is operating reliably at scale.

Key takeaways:
- Understanding what resilience at scale is, with concrete examples of what could go wrong
- Learn how to ensure your IoT application is resilient at scale using AWS IoT
- Take home a mental model for building resilience at scale

The talk was accepted to the conference program