People are more concerned with scalability than resilience, yet distributed denial of service (DDoS) attacks are more common than going viral.

Either way, if you're unprepared, going viral is almost equivalent to suffering a DDoS attack. It will totally paralyse your infrastructure, costing you a lot of money and missed opportunities.

You wanted scalability? RESILIENCE is one of the prerequisites.

DDoS attacks are like the weather. You can't prevent them, but you can prepare. Make sure the roof doesn't leak. Wear multiple layers. And especially don't leave the front door open! Or the windows.

I have seen a pattern of clients who are "sophisticated" enough to use Kubernetes, but who still haven't implemented these simple best practices:

1. Cache static content and serve it from a CDN.

It's a waste to serve images and scripts from your back-end servers. Serve these using your cloud provider's content delivery network (CDN), e.g. Amazon CloudFront, or an external one such as Cloudflare.

Additional benefit: much snappier load-time for the end-users!

Additional details:

You'll need to carefully tweak your HTTP headers to fine-tune the caching behaviour. You can get additional benefits from controlling the client-side browser cache too.
If your main HTML pages have to be generated dynamically per request, it's usually possible to rework things to make them static, with the dynamic component being generated on the client side.
Been there, done that: talk to me if you're having trouble with this.

2. Use domain names, not fixed IP addresses.

You won't be able to change your system's fixed IP addresses under fire.
Domain names and subdomains give you much more flexibility.
Assign different subdomains to customer-facing systems and back-office systems — see next.

3. Don't route internal requests through your public interface!

When one back-end service makes a call to another, don't route the request through the public interface! Use internal addresses. Apart from being a necessary security practice, it prevents your internal requests from having to compete with incoming traffic.

4. Use distinct infrastructure for handling customer-facing requests vs. back office systems.

If all requests reach your system via a common route that isn't resilient, you won't have access to your back-office systems when your customer-facing systems are overloaded. Ensure that you don't have a single point of failure.

It's not just your access to your back-office: what about things like payment confirmations that arrive in the form of callbacks by 3rd party providers to your webhooks? You need to ensure that these will be totally unaffected by heavy loads on your customer-facing systems.

Also:

Maintain and enforce a whitelist that controls which parties can trip your webhooks.

5. Use Cloudflare as your first line of defence.

Pay attention to these features:

I'm Under Attack Mode
Caching (see point 1 above)
Rate limiting (part of Web Application Firewall)

Conclusion

Each of the above measures has a significant and independent impact, and they are almost free if you design them in from the beginning.

"Can you help me?"

If you're a software company growing past 10 to 20 engineers, you will hit significant engineering and management roadblocks. Let me help you remove them.

🤙 Contact me via my LinkedIn profile / book a free discovery call via my website / reply to this email.

Thank you for reading this far!

Comment, ask, suggest, clarify, and especially correct me via this LinkedIn post or by replying to this email.

5 best practices to survive going viral (and DDoS attacks)