How to Write Atomic Repositories in Go | by Angus Morrison | May, 2022

Perform atomic operations without leaking transactions into your business logic

Photo by Dan Meyers on Unsplash

Writing a repository that performs database operations inside a transaction without leaking its implementation details into my business logic has been a recent obsession of mine.

In Go projects that follow Hexagonal or Clean Architecture, you’ll see the separation of concerns along package boundaries. Code that powers the business logic is decoupled from code responsible for data storage through the use of interfaces.

Imagine an application that enrolls students in courses. The business logic might represent the association of courses and students as classes, and validate certain constraints on these classes. For example, by ensuring they’re not oversubscribed.

To do this, it needs access to the course and student data, but it shouldn’t care whether that data source is an SQL database, in-memory cache, or an intern typing in data at a terminal. The business logic cares only that any data source it’s given response to the methods it needs to enroll students.

Usually, this takes the form of a Repository interface. I’ve omitted the definitions of struct types such as Class, Courseand Student for brevity.

Which is satisfied by concrete types in a repository package:

The ClassRepository concrete type knows how to interact with an SQL database. It implements the class.Repository interface so that the business logic can use it without knowing that there’s a relation database under the hood.

At application start-up, we can instantiate a new class Service to handle our business needs, passing it any concrete thing of interface type Repository.

This is a clean way to handle simple interactions with a data store.

This pattern breaks down when you need to perform a sequence of actions atomically. That is, when you need multiple repository operations to happen inside a single transaction.

Let’s say you want to validate that a course has capacity for incoming students before enrolling them. Using this pattern, you would retrieve the course and student data via the repository in the form of a Class object, check its capacity against the length of the students in the enrollment request plus the number of students are already enrolled and return an error if there’t enough spaces in the class.

But what if a concurrent enrollment request completes after you’ve retrieved the class data from the store, but before you enroll the new students? If this second request takes the class to capacity, and our first goroutine enrolls more students, our course ends up silently oversubscribed. Our initial enrollment request has no way of knowing that the number of students enrolled in the course has changed, and no way of preventing this from happening while it performs its own enrollment.

We need to perform these steps atomically. Getting the class data, validating that it has capacity, and persisting the enrollments must happen inside a single transaction.

That’s a problem. Only the repository layer knows how to manage transactions, and we don’t want to move our business logic into the repository. We could define the repository interface so that its methods return and accept transactions, but this would leak the implementation details of the data store into the business logic.

On the other hand, the requirement that these operations happen atomically is a business requirement. The database doesn’t care if a course is oversubscribed. It’s a dumb store of bytes, and even if it wasn’t, we don’t want to end up in a situation where we have to new enrollment validation for every repository we might implement in the future. That defeats the point of our Repository interface.

How do we keep our validations in the business logic while ensuring that all enrollments happen atomically?

How about defining validations in the business layer, and injecting them into the repository with a single method call, which then runs all the enrollment operations atomically?

This looks neat and tidy, but we’re leaking the business domain into the repository layer by expecting the repository to know when to call the validations.

As a result, there are some confusing ambiguities in the validation code. For example, how do we know whether len(class.Students) on line 11 is referring to the number of students already in the class or the number of students after we’ve added the enrolling students too?

Onwards.

We’ve established that transactionality is a business conern, so let’s bring transactions into our business layer in a way that doesn’t expose the implementation details of the transactions. The way to do this is with an interface.

Now we’re getting somewhere. It still feels odd that our business logic has to manipulate a Tx object, but we’ve decoupled the use of the transaction from its implementation, so we no longer care if it’s a SQL database transaction or even something that’s just pretending to be a transaction. We can now create mock implementations of Tx for unit testing. This is a great improvement.

The story on the repository side isn’t quite so happy:

As a result of requiring our respository methods to accept the interface type class.Txwe must perform a type assertion on the interface to get the concrete transaction type back again before doing any work on the database tables.

This isn’t unreasonable. The type conversion will only fail if we do something monumentally stupid, like mixing up the transactions of two separate repositories with different underlying data stores. It is ugly though. By making our business logic handle transaction objects, we create a lot of boilerplate. We can do better.

Telling a repository that it should run its operations atomically is a business concern, but we don’t want the awkward type assertions that come with managing a transaction from the business layer.

Let’s define a repository that can manage its own transactions, starting with its interface in the business layer:

AtomicRepository embeds a Transactor , which is some object that can begin, commit and roll back transactions. Notice that no transaction is returned or accepted by these methods. The management of the transaction itself is an implementation detail. This allows the business logic to call AtomicRepository.Begin Followed by any other repo methods it likes, safe in the knowledge that the repository’s operations are now atomic. When done, the business logic calls Commit.

The business logic no longer has to wrangle a transaction object, and the repository no longer has to perform type assertions because it’s not accepting transaction interfaces at call sites. Instead, it retains the transaction internally. Let’s look at how that’s implemented:

We define the unexported type transactorwhich has both database and transaction fields, and implements all the methods required by class.Transactor. When we call (*transactor).Beginit creates a new transaction from its database and stores it on the struct.

AtomicClassRepository embeds a *transactorso that when repository methods are called, it first checks the embedded transaction field for an active transaction and, if one exists, it performs the desired database operations using the embedded transaction.

Thus, our business logic has control over when to start and end a transaction, but doesn’t have to manage either concrete or interface transaction types, which is the responsibility of the repository layer.

Looks good? Well, no. There’s a catastrophic issue.

As you know, our class.Service struct, which defines our business methods like Enrollhas a repo field of type AtomicRepository that provides access to all the methods our Service needs to interact with a data store.

class.Service will be passed around our program as a pointer, *class.Service. A single service is shared by all goroutines handling enrollment requests.

Which means that more than one simultaneous enrollment request will result in concurrent calls to the same repository.

Which contains a single, unprotected transaction.

When this happens — and it will happen — all bets are off. You might get an amalgamation of enrollment requests written to the database in a single transaction. You might get TransactionInProgress errors in goroutines that haven’t called Begin yet. You will certainly trash your database.

Let’s take stock. We don’t want our business logic to leak into the repository. We don’t want the repository’s implementation details leaking into the business logic. We must run business operations atomically, but we don’t want to handle a transaction object at the business layer, and we don’t want to perform type assertions on transaction interfaces at the repository layer. More than any of this, we want our reads and writes to the repository to be thread-safe.

We’re 90% of the way there. There’s one more trick to reveal.

Traditionally, repositories are long-lived objects that get created at program start and cleaned up when the application exits.

In this paradigm, a repository can’t hold an unprotected reference to a transaction, because concurrent access from the program’s many request handlers will clobber it. Locking the transaction reference with a mutex is futile, because it would effectively reduce our pool of database connections to 1, and our program would chug as requests queue up waiting to acquire the lock.

But if we start thinking about a repository as a mayfly that briefly lives, fulfils its purpose, then dies, these problems go away.

Instead of one long-lived repository, we can instantiate a new repository for every enrollment request. This repository manages a single transaction in a single goroutine. When we’re done with the transaction, we throw away the whole repository.

From the business perspective:

Instead of holding an AtomicRepository on the Service we now store an AtomicRepositoryFactory. When called, AtomicRepositoryFactory returns a new AtomicRepository which is scoped to the current function. The transaction that we begin on this repository instance is protected from all outside influence, and other goroutines are free to instantiate their own repositories, making use of the full pool of database connections.

Let’s implement AtomicRepository and its factory func in the repository layer:

NewAtomicClassRepositoryFactory returns the AtomicRepositoryFactory that our class service will use to instantiate a new repository for each request. The repository itself is featherweight. Creating a repo requires the creation of two new pointers, so unless you’re working with a highly performance-sensitive application, instantiating short-lived repositories won’t be a bottleneck of concern.

Notice that the AtomicRepositoryFactory we return from NewAtomicClassRepositoryFactory is a closure around a pointer to the underlying database, which is A long-lived, thread-safe object. The closure allows us to pass a single database reference at application start and then forget about the database for the rest of the program’s runtime. There’s no need to pass a new reference to the database every time we create a new repository using the factory.

Hooking this all up in main.go is simple:

Using a factory function to create single-use repositories gives us even more power to abstract transactions away from the business layer.

For example, you might decide that every new repository should start life with an active transaction, eliminating the need for business code to call Begin on the repo.

I opted against this approach, because it’s easy to imagine this behavior surprising other developers. Starting a transaction also takes a database connection from the pool. An unsuspecting colleague might instantiate a repo and save it for later, unaware that they’re leaking a connection.

Beware of making transaction management too implicit. You shouldn’t need to understand a repository’s internals to use it safely.

There you have it. A clean, thread-safe way to handle atomic database operations while respecting the separation of concerns between business logic and the persistence layer.

The takeaways are:

  • Move away from long-lived repositories to single-use, “mayfly” repositories that are scoped to one function running in a single goroutine.
  • Have each repository instance manage its own transaction, but allow the business layer to control when these transactions begin and end.
  • Instantiate repositories per request using factory functions that close around a shared database.

Leave a Comment