Jorge Manrubia

February 2, 2022

Of bugs and giants

In October of last year, we started to receive reports about chats that  stopped working in Basecamp. They were all iOS/Safari users. Campfires wouldn’t work, as if there was no internet connection, and the only solution was completely restarting Safari.

The bug was elusive and happened very intermittently. We tried several things without understanding what was going on. Sometimes they seemed to help, but we eventually hit the problem again. We decided to prioritize it and put several people to work on it.

We discovered that the new WebSockets implementation in Safari 15.1+ presented some serious bugs. We found a way to reproduce a consistent crash when using compression, although this wasn’t the problem impacting us, and, finally, a way to reproduce our bug in question. With that in place, we could prepare a patch. After shipping the fix, we noticed a steep decrease in these errors, and, more importantly, we stopped receiving customer reports about this.

image.png


We also upstreamed this fix to Rails, to hopefully prevent other Rails users from going through the same painful experience.

While investigating this, I was concerned that this was a bug we couldn’t fix. Something was getting borked in Safari internals and completely breaking WebSockets until you restarted it. We found a way to circumvent the problem with Javascript, but it might well have been the case that this wasn’t possible.

This also made me think of Apple and its responsiveness to Safari bugs. They released a new implementation for a core web system that was severely flawed. As a result, many applications and WebSocket libraries stopped working on Safari. I am not naive, and I’m not expecting an overnight fix but, months later, we don’t even know if we can expect a fix. I am genuinely curious about what kind of organizational issues make a situation like this possible (I dare to discard a lack of resources 😬).


---