It is the general consensus that IT Problems are pack animals. This evening once again proves this.
It all began in a completely harmless fashion: For a week or so, I had been having problems with my work eMail Account. The Organization I work for hosts their eMail Infrastructure at Microsoft, Office365 to be precise. This enables us to use the full Exchange functionality at no cost (Since it is an educational institution, Office365 is free for us). Personally, I could not care less about that, since I am using Linux exclusively, but hey, as long as I can get my mails, whatever.
So, about a week ago, Thunderbird started reporting errors when connecting to my eMail Account. It babbled something about the login to the server failing, offering me to Retry, Change my password or cancel. I hit retry a few times, and eventually the message stopped. I assumed Thunderbird had finally managed to get a connection with the Server and happily kept doing whatever I was doing. I never wondered why I wasn’t getting any mail, since it was very rare for me to receive any mail on that account anyway. This was mistake number one: Assuming that Thunderbird would not stop trying to connect without explicitly telling me about it.
This impression was intensified by the fact that my daily backups of my mail accounts were running just fine. My Cronjob reported no errors. And, at the beginning, my Android Phone running K9 Mail kept getting the messages I got sent. The phone did not report any new mails either, and no errors, so I assumed that everything was fine, since even if Thunderbird was silently stopping the connection attempts, and even if my daily backup was having unreported errors, at least the phone would surely complain if it was unable to get a connection with the servers. Triple redundancy in error reporting, what could possibly go wrong?
Famous last words.
So, today, the error messages of Thunderbird finally pissed me off enough for me to investigate. Our eMail Service had recently been migrated (by Microsoft) from Live@Edu to Office365. The Documentation for the upgrade claimed that no changes would have to be made, so I left my Settings the way they were, and they kept working happily through the upgrade, even permitting me to send a notification to the organization, notifying them of the completed migration, and fixing a few Problems that occured afterwards. That was all before my problems set in.
So, as I said, I started to investigate. I tried to find out which servers I was supposed to use, and updated my Thunderbird config. The problems were still there. Curious, I logged into webmail to check if my account was still active and my password still worked. It was and did. After the login, I was greeted by “9 new messages”, the oldest going back to last monday.
I will not bore you with my struggles to get Thunderbird working. I triple checked password and server settings, changed my password, waited 30 minutes, nothing would work.
Curious how my Android had kept working through all of this (or had it? I had never seen those messages after all), I started up K9 Mail and tried to refresh my account. It went through without error message, but also without downloading the new messages. I updated the server information and suddenly, I got an error message, claiming a wrong password. Great. After deleting and re-creating my K9 Mail Config for the account, I still could not get it to work. K9 Mail had not been able to connect to the server for a whole week, but had not seen fit to inform me of that. Awesome.
Now, I was really interested in how my backups had kept working through all of this. I manually ran my backup software (I was using OfflineIMAP), only to see that the Program was throwing an exception when trying to connect to my account. The exit status (“echo $?”) was still zero though, indicating success. Frustrated, I hit up their GitHub-Page, intending to write a bug report, when I realized that I was running a horribly outdated version that I had installed from the Raspbian-Repositories (Debian for Raspberry Pi). I removed the old version, installed the current one, and retried the run, being met with an Error about cert_fingerprints not being set. The Program still exited with 0, by the way, even though someone who was running an automatic update of the program using apt-get, for example, would never have seen this change, and thanks to the success indicator of the exit status, would have never been notified that his backups were failing. I wrote up a bug report, fixed the config file, and tried again. Now, I was getting the “LOGIN FAILED” I expected, but the Program STILL exited with a Status of zero. I sighed heavily (actually, I cursed loudly), updated my bug report and mailed Microsoft Support about this problem.
It has been two hours now, and I have found:
- One case of bad coding in Thunderbird (not reporting when stopping the connection attempts)
- One case of a lacking Error messages in K9 Mail
- Two cases of a potentially fatal wrong exit status on OfflineIMAP
- One case of WTF about Microsoft (Seriously, why doesn’t this crap work?)
- One case of foul mood and desire to punch cute kittens
- Don’t rely on error messages being there if you have never seen them
- Don’t rely on the exit status of software you have not written yourself and / or tested.
- Don’t be sure that since you have three different ways of being notified when something goes wrong, you actually will, unless you have tested at least one of them (Basically 1 and 2 combined)
- Even (or: especially) a billion dollar company like Microsoft can and will screw up, and they will probably not fix it if you do not complain.