Otar Chekurishvili

May 10, 2023

Challenges of Building GDPR Compliant Event-Sourced System

Recently, I've been working on a FinTech project where Event Sourcing was a perfect fit.

The main idea behind event sourcing is that instead of manually changing the application's state, you rely on an immutable, append-only log of events (event store) to build it up. It requires a very different mindset than the traditional OOP and CRUD applications.

Event sourcing shines for the FinTech systems: when modeling the financial transactions using it, you automatically keep track of the events log. This immutable log becomes a "root cause" of why money movement happens in the system. Eventually, this event store becomes a single source of truth for the financial ledger. You could potentially remove all the tables (or structures) from your database (except the one(s) storing the events) and build the same state again by replaying the events.

# An example of normal event payload.
{
    "uuid": "f26164d0-2654-4434-9d88-56a9002e2d83",
    "event": "CustomerWasRegistered",
    "payload": {
        "first_name": "John",
        "last_name": "Doe",
        "email": "john.doe@example.com"
    }
}

Here comes the challenge: if the company is based in the EU, it must comply with the GDPR rules, which simply means that your customers can ask you to remove their personal information from your systems, and you must (otherwise, the company might get huge fines). This is a problem for event sourcing: personal data is saved in the  event store, and you cannot simply remove them.

Two possible solutions to handle GDPR compliance for the event-sourced systems are Crypto Shredding and Forgettable Payloads. And even if we have a solution, what happens to already emitted and stored events in the event store? How do we handle that?

Let's dive in...


Crypto Shredding

Crypto shredding means that instead of saving raw customer data in the event store, you encrypt personal details with the per-customer key and hold it somewhere else (file storage, another database, etc.), away from the event store. When requested to remove customer data, you can remove the encryption key, making decrypting personal details impossible.

# An example of AES encrypted payload where we also store the identifier of the external storage (from where we retrieve the key necessary for decryption).
{
    "uuid": "f26164d0-2654-4434-9d88-56a9002e2d83",
    "event": "CustomerWasRegistered",
    "payload": {
        "first_name": "+u89xDc88myBiLqBeTjUkg==",
        "last_name": "9686w83PScQ34tYmkvtb3w==",
        "email": "IcuoeMoZyt/RY16iyqHB4ltWP2kdyuf92kuNFx6ZJGk=",
        "encryption_key_reference_uuid": "c90d1682-be34-4800-a2ba-08899f3f4180"
    }
}

As much as I like the idea of encrypting personal information, the massive downside of this approach is that encrypted data stays in the event store. It might be fine in most cases; however, from the legal point of view, GDPR states that encrypted personal information is still a piece of personal information, whether someone has a key to decrypt it or not. If there is a data breach of the event store, encrypted personal information will be leaked. Because of this, the Crypto Shredding technique is not compliant with the GDPR, and we must seek another solution.

By the way, GDPR is problematic for blockchain technology as well. One may argue against this, but blockchain combines several techniques at its core: P2P networking, immutable database, and cryptographic signing and verification of the ledger entries. If you are to store personal information in an immutable database, you are facing the same problem as in the case of event sourcing.


Forgettable Payloads

Like Crypto Shredding, the Forgettable Payloads technique relies on storing personal information in a separate database, far from the event store. We only need to keep reference IDs to that external storage to retrieve personal information when needed. When you receive the request to remove customers' personal information, you delete personal data from the external storage, leaving the event store untouched.

# An example of event payload where we only store reference ID to the customers' personal details.
{
    "uuid": "f26164d0-2654-4434-9d88-56a9002e2d83",
    "event": "CustomerWasRegistered",
    "payload": {
        "customer_details_reference_uuid": "db4f7c6a-703c-4f94-8118-4610c8368e7b"
    }
}

Since we no longer store raw or encrypted personal information in the event store, this solution complies with the GDPR rules. The tradeoff is that our system now relies on an external database, which violates one of the significant building blocks of event sourcing: the event store should be the single source of truth. You cannot achieve GDPR compliance by using event sourcing exclusively; another data store alongside the event-sourced system is needed, which means increased complexity of your system.


Making Existing Event-Sourced System GDPR Compliant

Forgettable Payloads only make sense for the new systems that are to be developed with the Event Sourcing technique. But what to do with the existing event-sourced systems already containing personal information in the event payloads?

The solution in this scenario is to migrate events from the existing event store to a new one and modify the payload in the middle of this migration. The typical process looks like this: connect to the event store → unserialize the event → modify the payload → serialize the event → save into a new event store.


Conclusion

Coming from the traditional software development mindset event sourcing is a significant paradigm shift. It is not a hustle-free solution and has its tradeoffs, but the payoff could dramatically improve developer productivity in the long run.

GDPR, on the other end, is something we must co-live with, and besides the complexities it brings to our event-sourced system, it still can be managed.


About Otar Chekurishvili

Internet Citizen. Software & Wine Craftsman. Digital Entrepreneur. https://otar.me