Building an Observability Culture

September 10, 2022

I chaired an Observability Guild charged with standardising metrics, logs, and dashboards across 30+ teams. Early on, there was plenty of resistance. Most teams had their own way of doing things, and nobody wanted another top-down initiative.

The resistance melted once people saw alerts that actually pointed to root causes instead of generic noise. Once engineers could go from alert to fix without a treasure hunt, they became evangelists themselves.

The guild's real win wasn't Grafana graphs — it was the shared language of SLOs and RED metrics. Visibility only works when everyone agrees on what "healthy" looks like. Getting to that shared understanding was harder than any technical setup, but it changed how we handled incidents as an organisation.