ExplainerMay 4, 2026·5 min read

Why Status Pages Always Say 'Investigating' for 45 Minutes Before Admitting There's an Outage

The real reason companies delay outage announcements: detecting distributed failures takes time, and admitting problems fast can trigger panic.

You've seen it a hundred times. A major service goes down. You check their status page. For 45 minutes, it says 'investigating.' Then suddenly: 'We've identified the issue.' What were they doing for those 45 minutes? Not stalling for PR reasons, mostly. The delay is technical, structural, and almost unavoidable. Understanding why reveals something important about how the internet actually works versus how we imagine it works.

The Detection Problem Nobody Talks About

A status page can't report what it doesn't know. This sounds obvious, but the implication is brutal: most companies don't have real-time visibility into their own systems failing. A database goes down in one region. The monitoring system that checks the database might also be in that region, so it goes down too. Or the monitoring alert fires, but the on-call engineer doesn't see it for three minutes because Slack is delayed. Meanwhile, users are already complaining on Twitter. The company knows something is wrong because social media is exploding, but their internal systems haven't confirmed it yet. That gap is the 'investigating' period.

Why They Can't Just Say 'We're Down'

There's a legal and operational reason for caution. If a company says 'outage confirmed' and then the service comes back in 30 seconds due to automatic failover, they've just told thousands of customers to stop retrying, give up, and file complaints. They've created a second wave of damage that was purely informational. The safer play: wait until you understand the scope. Is it affecting 1% of users or 100%? One region or global? If you announce before you know, you risk either under-communicating (users think you're lying) or over-communicating (users panic and overload support). Forty-five minutes is often the time needed to answer these questions across a distributed system.

The Surprising Role of Redundancy

Here's the non-obvious part: companies with better redundancy often have longer 'investigating' periods. Why? Because when a system fails, automatic failover kicks in immediately. Users might barely notice. But the failover itself is a symptom of a failure. The on-call team now has to determine what failed, why the failover happened, and whether the failed component is actually down or just slow. A company running on a single database would know instantly: database down equals service down. A company with multi-region replication, load balancing, and circuit breakers might have a service that's technically 'up' but degraded in ways that are hard to measure. The more sophisticated your infrastructure, the harder it is to say with confidence 'yes, we're down.'

What 'Investigating' Actually Means in Real Time

During those 45 minutes, the team is doing specific things: pulling logs from multiple services, checking if the issue is cascading (one failure triggering others), verifying the fix won't make things worse, and calculating the blast radius. They're also waiting for their own monitoring dashboards to load—which are often running on the same infrastructure that's partially down. A senior engineer might be in a war room getting conflicting information: the metrics say everything is fine, but customers are reporting errors. This contradiction takes time to resolve. They need to find the ground truth. Is the metric broken or is the user experience broken? Often both are true simultaneously, which is what makes it genuinely hard to report.

How to Know the Real Status Right Now

Don't rely on status pages during an outage. Use independent monitoring like WebsiteDown.com, which checks services from external vantage points. If you can reach a service from the outside but the company's status page says 'investigating,' you have real data the company doesn't have yet—at least not in their dashboards. Check social media for reports from other users in your region. Look at DNS propagation and ping times. If you're an engineer, query the service's API directly and time the response. The company's 'investigating' period is real, but you don't have to wait for them to figure it out. Your own external testing is often faster and more honest than their internal visibility.

Check if a website is down right now

Free real-time server check — results in seconds. No sign-up required.

Or set up automated uptime monitoring →
Check a website
← Older
Why ChatGPT Goes Down So Often (And When It's Coming Back)