Monthly Archives: June 2017

o no its ringing o no

How to do a crisis call in technology

This started as an answer on Reddit that someone thought was good enough to gild. So I thought I’d expand it here while Lil Z is sleeping in case it helps more folks. Some tech folks really wanted to know how to talk to business people on troubleshooting conference calls. I’ve seen this done well and done poorly and I have strong opinions on the subject.

The System is down.

It isn’t doing the Thing it does and many, many people are desperate to get the Thing for important money reasons.

You are supposed to bring the System up.

The suits, who are measured on money stuff they can’t do without the Thing, feel terrified. They feel helpless. They can’t bring the System up, they need you to do that. You geeks want to concentrate, think silently, occasionally type things and mutter to each other. That is useless on the big conference call and just inspires anxiety. The day job of a suit usually involves being informed and making decisions – and they can’t do either.

Here is how you do a crisis call right:

1. There is a suit facing call and a geek facing call. Geek talk isn’t the same as suit talk. Perfectly reasonable geek talk (“a reboot will cause us to lose unsaved data”) cause suits to overreact and cause more problems.
2. The talkiest nerd gets to be in charge of communication. That’s their job, not troubleshooting.
3. They go back and forth, figure out status and direction and make real estimates. They give regular updates to suits and give direction to the geeks.

Two Calls

It’s important that you tell people status – if you don’t promise status at regular intervals, they will try their best to go find things out or try to help. They will interrupt the people doing the work so that they can get information they need to deal with their immediate problems.

Geeks get angry when questions interrupt troubleshooting because they think the problem is that the System is down. That isn’t really the problem – no one cares about the System but the geeks. The problem is that the Thing isn’t happening. If they could get the Thing without the System, they would – and that might be a solution you can offer.  The suits may be very happy with you taking a long time to fix the System if you can give them the Thing at regular intervals or on demand until the System is back up.

Folks at Krispy Kreme don’t care about the beautiful donut glazing machine as much as they care about devouring delicious hot donuts.

The Talkiest Nerd does the talking

The talkiest nerd can fulfill a very important role by giving status, information and choices to the suits. Let the suits actually make choices based on good information! They should make those decisions so they can make the money stuff happen! The whole reason suits employ geeks is because they need the Thing to make the money stuff happen – they need to know that they aren’t going to get the Thing for at least 4 hours because then they can tell customers to be calm, they are going to get compensated – or they can tell customers don’t worry – we’ll have the Thing within 4 hours.

Timely Updates

Another important point here is Timely updates. Set a schedule and then keep to it. If you say I’ll give you updates on this conference call every 30 minutes or every 15 minutes, then do it. The talkiest nerd can interrupt everyone 5 minutes before on the geek call and make sure they’ve got a good handle on things so they can give a real status.

The reason you make Timely updates is so that people can deal with silence. The big worry is that no one is working on things or that they are working on the wrong thing. Remember, the suits are feeling an unwelcome sense of helplessness. They can deal with silence if they know that they need to be back on the call at X time to get the next status.

It is important to give them some chill so that they can go and do the very important work of handling the downstream problems of the Thing not happening. They can go work on that knowing what’s happening in the next 30 minutes.

Things To Say

  • We have X people looking at the issue and this is our Y priority.
  • We have an idea what the issue is and we are testing it to make sure we are fixing the right thing.
  • It will take about X minutes to confirm and we’ll next update you with status at 11:45.
  • We were wrong and now we think the problem is X and we’re testing that idea.
  • We think we know what’s wrong. Here is an ELI5 short description and here are two ways we think we can solve the problem. Here are the high level time and danger trade offs in those solutions and here is the one the geeks recommend for the following reasons. (Let them actually make an informed choice here)
  • We don’t have any updates right now. This is tricky and X people are discussing the best next steps. We don’t see how we can get this solved before our next status update at 12:30, but we’ll blast out a message if anything changes drastically.
  • We’re manually handling a certain type of problem – please put a list together of the clients you want ordered by priority and we’ll handle them in that order and let you know when each is done.
  • The problem should be wrapped up in X minutes and we will handle any other issues resulting afterwards.
  • Tomorrow we will be rested and have a thorough look into what went wrong, how it wasn’t caught earlier, what warnings we can put in and how we can handle it better next time.

Why should you believe me?

Since 2003 I’ve worked in finance and technology, often as the face of technology to the business. Things have gone wrong and I’ve seen good communication and bad communication. In places where the business trusts tech to handle a crisis, these kinds of patterns have worked. I’ve also worked in places where there was a terrible relationship between technology and the business: patterns like these helped improve things.