Consistency vs Availability

Around 1968, just about everything in computer science was invented. Kevlin Henney did a great overview some years ago:

https://www.youtube.com/watch?v=KjgvffBlWAg

One of the things discussed around then was database replication. If a server becomes unavailable for some reason, it is good to have its data somewhere else as well, for lots of reasons. However, it is quite important that the data is actually the same everywhere. Let's say I deposit some money in my bank account. If I then want to withdraw it somewhere else, it had better be available. So, since the early 1970s, there has been an insane amount of research on how to keep distributed data consistent (the PODC conference is a good starting point), using a minimum of network communication. For example, if I deposit money somewhere, *your* balance does not have to be replicated.

Then we can take this one step further, as the order of operations is not always that important. If I deposit some money just a few seconds before you do, it is actually ok if your money reaches the other bank offices a few seconds before mine. If we generalize this a little bit, we get a CRDT (Conflict-free replicated data type). In the software industry these are becoming increasingly popular now. One of the papers I got published last year was on such a thing:

http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-60046

However, what I realized just before writing that paper, was that from a business perspective, availability is much more important than consistency. Even if an ATM is offline, allowing small withdrawals can be ok so the customers can buy food and travel home. Overdrafts can be resolved later.

In the SMS domain, where I work, consistency among servers is not particularly important either. Let's say we manage two SMS gateways, one forwarding SMS in Europe, and one in South America. If an SMS arrives on the server in Europe and is then forwarded less than a second later, the other one never even has to know about it. If the one in Europe dies before the SMS can be forwarded it's another matter. However, if we have ten SMS gateways in various countries, not all of them need to know about every single SMS. This was the topic of the paper that took the longest time to get published for me:

http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-59629

The core issue here is that the more servers we need confirmations over the network from, the more sensitive each server will be for power and network outages. So, there is a balance between consistency and system availability. Apart from the various shortcomings in the previous versions of this latter paper, focusing on availability when more or less the entire research community discussed consistency, really did not help getting it accepted.

Then, yesterday, my PhD advisor sent me a link about a new summer school for PhD students on Distributed and Replicated Environments:

https://www.vub.be/dare-2023

The interesting fact here is that their focus is actually on availability. At least half of the speakers have appeared in my references sections during the past few years, so it would have been a perfect fit, had it been held a couple of years ago. But I've already gotten my PhD, so I'm not allowed to attend this. However, it gives me hope that both availability and other quality requirements will be given more focus now, as consistency is basically "solved". My role here is of course epsilon at best, but maybe there will eventually be a new workshop or even conference in the future, discussing this. Maybe somebody will even cite me?

/Daniel