I can be wrong 🤷

A few years ago, I attended a talk by Swedish Buddhist monk Björn Natthiko Lindeblad, and he told a story which stuck with me.

A magic formula #

Björn had a Master’s Degree in Economics and a traditional financial job. However, he did not like it and eventually ended up as a Buddhist monk in Thailand.

He described how he and fellow Westerners were perceived by the local Thai villagers. Much like many Westerners feel that the East possesses some ancient and mysterious knowledge lost to the West, the Thai villagers believed the Westerner monks to have special knowledge that the regular Thai monks did not.

When a Westerner monk led the worship, attendance was always high. In the story, a British monk was talking to the followers. The monk said that he would tell them a magic formula which had the power to change their lives. A whisper went through the audience followed by complete silence. Everybody was waiting for the magic formula. The monk then loudly said, “I can be wrong. I can be wrong. I can be wrong.”

Troubleshooting a database bug #

A few years back, I was troubleshooting a weird database problem. Understanding the problem requires some technical detail, but I’ll try to keep it simple. (There will be a point in the end, I promise.)

Every now and then, a call to our database failed with an error message: “value too large for column”. This is supposed to happen if you try to write, say 17 characters, into a column which can only hold 16 characters.

Apart from the obvious cause where you simply try to insert a too large string, there are a few other reasons why it could happen. I looked into them all. I checked, double-checked, and triple-checked the code and database configuration.

I looked closely at the code which generated the value to be written, just to be sure. Could it generate strings that were more than 16 characters long? I could clearly see that it could not. It was actually quite obvious, because the code in question looked something like this.

String output = input.substring(0, 16).toUpperCase()

This code is not complicated. It starts with the input variable (whatever string it may hold), then substring extracts the first 16 characters, and toUpperCase converts them to the corresponding characters in UPPERCASE. So if input holds My name is Henrik Jernevad, output would be MY NAME IS HENRI. Simple!

“I know this stuff” #

At the time, I had been programming for about 25 years. I was confident that I knew what substring does and I knew what toUpperCase does. As a programmer, these are not complicated functions. I had no doubt whatsoever, that the string in the output variable would always be at most 16 characters.

At some point, one person mentioned that the data which caused the problem could be connected to a customer in Germany. So we started digging up examples of data in German to see if we could come up with anything. We did. And when the truth hit me, I experienced total cognitive dissonance. It felt like a freight train had run in to me at full speed (not that I really know how that feels, but you get the idea).

The chocking truth #

You see, the German language has a character “ß” (eszett). It is related to the letter “s” and can be thought of (and is sometimes replaced by) two “ss”. Adding to the mystery is that “ß” is a lowercase character, like “a” or “s”. At the time of this incident, there was no official uppercase version.¹ Instead, it was written as “SS” in uppercase form.

So what happened? Every now and then we encountered an “ß” in the input string.
When we did, substring limited the string to 16 characters, but then toUpperCase replaced “ß” with “SS”, turning the output into 17 characters long! For example, "Ich heiße Henrik" became "ICH HEISSE HENRIK" which just so happens to be one character longer.²

This unceremoniously broke my I’ve-been-doing-this-for-twenty-five-years-and-know-how-it-works-thank-you naiveté. In all those years, toUpperCase had always returned the same number of characters as it was given. Calling the toUpperCase function with henrik as input produces HENRIK. In my mind, it was basically a law of nature – you send 16 characters in, you get 16 characters back. Simple!³

But not in Germany. 😉

Conclusion #

Now, what is the point of all this? The lesson I learned that day is not to be too sure. Even though you feel completely confident that something will work, there is always a slight chance that you might have missed something.

“I can be wrong. I can be wrong. I can be wrong.”

Germany has ended a century-long debate over a missing letter in its alphabet ↩︎
A “funny” fact is that had the order of the substring and toUpperCase calls been swapped, the code had worked as intended. ↩︎
To make things worse for me, the toUpperCase documentation clearly states that the result is locale specific. ↩︎