14:00 - 18:00
Chaos to Calm: An Advanced Full-Stack Guide to Reliability
Every app is reliable… until it isn’t! If there is one truth about software development, it is that eventually, something will go wrong. When a critical security exploit or performance degradation hits production, do you know how to spot it? More importantly, what do you do next? Are you equipped to triage, coordinate, and resolve a high-stress incident before it breaches customer trust?
This interactive workshop bridges the gap between full-stack engineering and DevOps resilience. After exploring reliability fundamentals and analyzing real-world system failures, you will start by establishing individual application observability baselines. Then, when somethings threatens the system, you will learn how to transition seamlessly from a solo responder to a structured, coordinated incident response team.
Working alongside an AI agent, you will move through the full incident lifecycle and close the continuous learning loop by automating your post-incident reviews.
By the end of this workshop, you will be able to:
• Understand the core pillars of reliability.
• Differentiate between a noisy alert and a genuine, high-priority SLO breach.
• Standardize incident communication (ICS) to enable effective, calm collaboration across teams.
• Leverage AI agents within your existing tools to triage, diagnose, and remediate incidents without context-switching.
