Mistakes were made

A verifiable error in our signup flow

·10 min read

NOTE: This post is part of our "Mistakes were made" series, where we share stories of things that could have gone better (or ideally not at all). There's no overarching theme or grand lesson. We're simply sharing experiences. If you're looking for a structured approach to learning from failure, consider resources like the Incident Handbook. I haven't read it but Codestral 22b seems to like it.

I introduced a bug a little over a week ago that prevented new users from verifying their accounts. Verification emails have been a bit of an issue over the years b/c there's no easy way to re-send them (I just haven't ever gotten around to it). In this case, the verification emails went out fine but when the user clicks the link it looks like it worked but underneath, it wasn't actually setting the verified flag on the account record.

Should there be tests for critical user flows like onboarding? Yes. Did I have them? No. Do I feel bad about it? A little. Am I going to write tests for it? Eventually. Have I started writing them already? No. Will I write them soon? Possibly. Am I willing to accept the consequences in the meantime? Of course!

As an aside about test coverage

There's a lot of code out there with test coverage so potent it can peel the paint off of a navy seal. But there's more to software quality than just the number and kinds of tests. Tests can have bugs too, and test coverage, although the name has a cozy ring to it, is not an all-encompassing panacea.

The road to software hell is paved with good intentions. And that's not the focus of the post anyway. I'm getting distracted.

There was abug afoot

I didn't catch the issue until Thursday evening thanks to a helpful report from a user. By that point a few hundred accounts had been created. I deployed a fix right away and got to work documenting what happened and the list of accounts affected. Old compliance habits die hard. And also come in handy when you need to do a post-mortem.

I waited until Friday morning to follow-up on manually verifying the accounts and sending out notifications. There was no rush at that point I sent out an email to the affected users to apologize and let them know that their accounts were now verified. I also included a fortune cookie message at the end of the email to lighten the mood and add some Lite Valueâ„—.

Hands-on support

I wrote some code to get a rough idea of which accounts were affected based on date range. Then split that into two Redis sets: verified and not verified. The not verified group was ~650 accounts. From there I codifed the process of manually verifying the accounts and sending out an email to each one. I added a sleep of 1 second between each email to play nice with the email provider. Some choices you make simply to avoid obvious headaches and don't spend time optimizing for the sake of it.

Email message

Hello,

Delano here, founder of Onetime Secret. We had a small snafu with some new accounts over the past week. If you tried to log in and kept seeing a message about a verification link, that was us messing up, not you.

We went ahead and manually verified your account so you should be able to sign-in now without any trouble.

Dealing with account issues is annoying, so thank you for your patience. If you have any questions or run into any other issues, just hit reply to this email. I'm here to help.

Delano
https://onetimesecret.com/

P.S. Here's a special fortune for you: Distant connections bring valuable insights.

Putting it into context

The bug was introduced following a full refactor of the underlying storage code (specifically release v0.17.0). It was the largest single update to the codebase in probably 10 years. The refactor was necessary to support new features and improve the overall capability of the platform. The issue with verification was minor in that context. Not ideal but also not heat death of the universe level of bad. And it turned out to be an opportunity to engage with users directly and provide a personal touch to the support experience.

Meta content: About the title illustration

As is tradition, the main illustraion for this post was created by Anthropic Claude 3.5. It did the heavy lifting and I filled the role of hype-man.

Older layout from June