Skip to content
Leadership

How to Lead When Everything's Breaking

A practical crisis playbook for founders and engineering leaders: stabilize the room, narrow the facts, and guide the team back to execution.

A real crisis has a way of compressing everything.

Plans disappear. Meetings stop mattering. Normal process turns into friction. And the team looks to leadership for a cue: should they panic, or should they focus?

That is the first job in a breakdown. Not solving every technical detail yourself. Not narrating worst-case scenarios. Not preserving business-as-usual.

Your first job is to change the operating mode of the company.

When production is down, a major customer is blocked, or a security event is unfolding, the organization needs a temporary command center: one place, one shared picture of reality, and one leader responsible for keeping the work pointed at the real problem.

Start by changing the temperature

Most crises begin with a spike of emotion.

That part is normal. What matters is how long it lasts.

Teams borrow their emotional posture from leadership. If the founder or engineering leader is frantic, scattered, or reacting to every new detail, the rest of the room will do the same. If leadership is calm and decisive, people can think.

You do not need to pretend the situation is small. You do need to make it clear that the team can handle it.

In practice, that means slowing your own reaction down before you speed the organization up.

Declare that normal rules are paused

A crisis gets harder when half the team is still operating as if the usual constraints apply.

Be explicit: this is an incident, priorities have changed, and the next block of time will be run differently.

That usually means:

  • nonessential work stops
  • calendars are cleared
  • approvals are shortened
  • teams get permission to optimize for recovery first and cleanup later

This is not an excuse for recklessness. It is a recognition that process designed for stable conditions often becomes a liability during unstable ones.

The team should know what matters now: contain damage, restore service, protect customers, and communicate clearly.

Build a real war room

Every serious incident needs a single coordination point.

If you are in person, that may be a conference room with the core responders in it. If you are remote, it may be a dedicated Slack channel plus a live call that stays open until the incident is under control.

The format matters less than the discipline.

The room should include only the people needed to understand, decide, and act. Too few people creates blind spots. Too many creates noise.

A good war room does three things:

  1. concentrates attention
  2. reduces side conversations
  3. gives the team one source of direction

Without that center, incidents sprawl. Multiple theories compete. People duplicate work. Stakeholders hear conflicting updates. Valuable time disappears.

Separate communication from problem-solving

One of the simplest improvements you can make during an incident is assigning a dedicated communicator.

This person should not be buried in debugging. Their job is to watch the room, capture the facts, and translate them into updates for customers, executives, or other stakeholders.

That separation matters.

When the same people trying to diagnose the issue are also answering every incoming question, attention fragments fast. The technical team loses focus, and external updates become inconsistent.

A dedicated communicator keeps the problem-solving loop clean while ensuring stakeholders are never left guessing.

Useful updates are usually short:

  • what happened
  • what is currently known
  • what is being done
  • when the next update will come

Clarity beats completeness.

Create one shared record of facts

In the first hour of an incident, speculation is expensive.

The best countermeasure is a living document that captures only what the team knows to be true.

That can be a doc, a note, a whiteboard, or an incident timeline. The tool is less important than the rule: facts first, theories second.

Your shared record should include:

  • the time the issue was detected
  • the systems or customers affected
  • actions already taken
  • observed changes in behavior
  • decisions made and by whom

This becomes the team’s anchor.

When discussion starts drifting into assumptions, you can pull everyone back to the documented reality. That keeps the room from chasing ten possible causes at once.

Let the team break the right rules

In normal operation, teams are rewarded for caution, polish, and thoroughness.

In a crisis, those instincts can slow recovery.

This is where leadership matters most. The team needs clear permission to take temporary, imperfect actions that reduce harm now, even if they create cleanup work later.

That may mean:

  • shipping a narrow fix instead of an elegant one
  • disabling a feature to preserve the core product
  • adding manual safeguards until automation can be restored
  • bypassing normal sequencing to get the system stable

The key distinction is whether a choice helps recovery.

Crisis leadership is not about abandoning judgment. It is about refusing to let lower-priority standards outrank immediate survival.

Keep the room moving

Leaders in incidents do not need to be the deepest technical expert in every thread.

They do need to keep momentum.

That means noticing when people are stuck, when an investigation has gone cold, or when the team is spending too long on something that does not change the outcome.

Ask questions that force clarity:

  • What do we know for sure?
  • What is the next decision?
  • What action reduces risk fastest?
  • Who owns that action?
  • When will we know if it worked?

Good incident leadership is often less about brilliance and more about tempo.

A room with clear priorities and fast feedback usually beats a room full of smart people arguing over the perfect path.

End the incident on purpose

Once the system is stable, say so.

Do not let emergency mode linger longer than necessary.

Teams coming off an intense incident often stay mentally locked in. Adrenaline keeps running, people keep refreshing dashboards, and nobody quite knows when to stop. If leadership does not mark the transition, the room never fully resets.

Close the loop explicitly:

  • confirm the incident is over
  • identify any remaining follow-up owners
  • stop the continuous coordination channel
  • tell people to rest, eat, and step away

Ending decisively is part of recovery.

Run the postmortem while the details are fresh

The work is not finished when the outage ends.

Within the next few days, gather the people involved and reconstruct what happened. The goal is not blame. The goal is understanding.

A useful postmortem answers four questions:

1. What triggered the incident?

Name the initiating event as specifically as possible.

2. What signals were missed?

Look for alerts, behaviors, or patterns that could have surfaced the issue earlier.

3. What helped during the response?

Document the decisions, tools, and habits that improved containment or recovery.

4. What made the response harder?

Find the coordination gaps, missing safeguards, and process weaknesses that amplified the damage.

The most productive postmortems treat incidents as system failures, not morality plays. Even when one action appears to be the immediate trigger, the real question is why the system allowed a single mistake to carry that much impact.

What strong crisis leadership actually looks like

The best leaders in a breakdown are not theatrical. They do not try to be the hero in every thread.

They create focus.

They simplify priorities. They make sure facts are visible. They protect the people doing the work from distraction. They keep stakeholders informed without polluting the response loop. And when the incident ends, they help the team learn without turning reflection into blame.

Crises are unavoidable in growing companies. Confusion is not.

If you want your team to perform under pressure, build the habits before the next incident arrives: a clear command structure, a standard place to coordinate, a shared approach to documentation, and a blameless review process afterward.

When everything is breaking, leadership is not about having all the answers.

It is about giving the team a way to find them together.