Error Correction

A general technique in teaching an “artificial intelligence” is to feed it ground truth. This may come in many different forms, but the essential idea is this: you help the machine learn from its mistakes by giving it the “correct” answer, and some time to reflect on it.

“I want you to think about what you’ve done.”

This technique works equally well for us humans, as pithily explained in Daniel Coyle’s The Little Book of Talent:

Tip #22: Pay attention immediately after you make a mistake.

Coyle elaborates: “People who pay deeper attention to an error learn significantly more than those who ignore it. […] Develop the habit of attending to your errors right away. Don’t wince, don’t close your eyes; look straight at them and see what really happened, and ask yourself what you can do next to improve. Take mistakes seriously, but never personally.”

In the workplace, most technology companies have some form of retrospective that they codify into a process. At Amazon, this is called a ‘Correction Of Errors’, or ‘COE’ for short. When someone writes up a COE, they write down the nature and import of the error to the business, lessons learned as part of the incident and recovery, and actions to prevent future recurrence of the same class of problems. COEs leverage the 5 Whys process popularized by Toyota for going deeper into the problem and establishing root causes.

I’ve always been a proponent of writing COEs and learning from mistakes. It’s important to see the COE as an “engineering chisel” rather than a “managerial hammer”. As an engineer, writing COEs is a discipline you impose upon yourself to hone your craft.

“Now Look What You’ve Done!”

Some years ago, I proposed the idea of a ‘2 Hour COE’ that was well-received by engineers. It went like this: don’t hesitate to write COEs or think of them as ‘work’; dive right in and do it; don’t spend more than 2 hours to write up a single COE (keep it ugly); focus on root causes and learnings more than anything else; don’t create more than 1-3 action items (avoid creating new work for the team).

Last month, I accidentally missed a meeting with colleagues because my iOS Calendar app wasn’t accurately synced with the Microsoft Exchange server. Someone mentioned “5 Whys” jokingly, and I thought to myself: this isn’t the first time something like this has happened, and I sure keep complaining about it — what can I do to resolve root causes myself? Perhaps I should write up a COE!

Now, I love writing and I spent an hour in the morning with good coffee in hand injecting some subtle humor into the write-up. It was primarily a joke aimed at the guy who mentioned “5 Whys” (and gloriously accomplished the job, if I may say so myself), but last week, it became popular on blind and I got asked by more than one person: was I joking or serious? But can’t it be both? It turns out I did get something really useful out of it after all: I discovered an app called VMware Boxer that is fully supported for corporate email and does work flawlessly on iOS.

If you’re an Amazonian and have questions, join #ama-riyer on the corporate Slack workspace.