Locking in Databases and Isolation Mechanisms | by Denny Sam | Jun, 2022

A peek into the world of database locks, their types, and issues associated

Photo by Jon Moore on Unsplash

Concurrency control is the term you should be aware of before getting into locks.

In a large application accessed by thousands of users, concurrency is inevitable. Your application needs to be able to handle multiple requests simultaneously.

When you execute operations concurrently, the results can be conflicting. For eg if you are reading a row while someone else is writing to it simultaneously, then you are bound to get inconsistent data.

If we execute these transactions sequentially then we don’t need concurrency control. But sequential execution affects the scaling of the systems, hence we cannot avoid concurrent transactions.

When the operations are executed concurrently, how do we make sure the results have consistency?

There are various techniques for concurrency control. And one of them is using locks πŸ”

So what are locks exactly? All you have to know is that locks are one mechanism to ensure data integrity. How these locks are implemented is beyond the scope of this issue.

So how do they work? Locks work by letting only one transaction modify or access a certain row or table. This makes it look like the transactions are executed one after the other and hence maintaining data integrity.

Different types of locks

There are majorly 2 types of locks. Exclusive locks and shared locks.

Shared locks let you read the row or the table that is being locked. Hence it’s also called a read lock. Multiple transactions (you can think of each transaction as a separate process) can acquire a shared lock on the same resource and read from it. No transaction is allowed to update the resource while it has a shared lock.

Exclusive locks lock the row or table entirely and let the transaction update the row in isolation. Unlike shared locks, only one transaction can acquire an exclusive lock on a certain resource at one point in time. And while it acquires the lock on that resource, other processes that want to acquire the lock on the same resource will have to wait. Once the lock is released, the remaining processes can acquire it and make modifications.

You should note that multiple shared locks can be acquired on a resource at one time. But if the resource already has a shared lock then another process cannot acquire an exclusive lock on it.

Similarly, a process cannot acquire a shared lock on a resource that is locked by an exclusive lock.

There are other types of locks that are modifications to the above. For eg write locks which lets you read a resource while it’s been updated.

This one is an interesting concept. See how I mentioned above that all the parallel processes that want to do database transactions will have to sequentially execute them in order to maintain data integrity?

But that can slow things down quite a lot, especially when the transactions involve multiple steps or updates.

So there are two ways of going ahead with it.

One approach is to let the processes update the same records in parallel. Only when one process successfully commits its changes, the other process is told that there exists a conflict, due to which the other process will have to attempt the transaction again.

Another approach is to lock the row as soon as one process attempts to modify it (or delete it) and ask the other processes to wait before doing anything.

The first approach is lovingly named optimism locking. It’s letting everyone have a chance at updating the records.

The second approach is called pessimistic locking wherein the first one to attempt gets the chance to update the record.

The difference these two methods induce can be felt in large transactions. Remember that a transaction can have a single step where it updates a row, or it can have multiple steps where it fetches a column from a row, queries another table, deletes a couple of rows, etc. In a transaction with multiple steps, having a pessimistic lock may not be a good idea since if the transaction locking the rows fails in between, the time that it spent has gone to waste. While in such scenarios if you have optimism locking, other processes might be able to complete their transactions.

Locks can cause tricky issues at times. What if the process that acquired the lock fails to release it, locking the entity from being accessed by other processes? It often happens that transactions go idle. If you haven’t set a timeout for transactions, they may acquire a lock and not release it.

Another issue is with lock content. There might be resources in your database that are accessed a lot. A number of transactions may try to acquire locks on these resources. As subsequent transactions have to wait for the others to finish, there might be a delay in executing these transactions leading to the contention of these resources or locks. It’s like one of those single checkout counters in stores where everyone has to wait for their turn.

Another issue that usually occurs with locks is a deadlock situation. In this scenario, one transaction acquires a lock on a resource (resource A) and tries to access the lock on another resource (resource B) before releasing the lock on the first resource. At the same time, another resources the lock on resource B and is aiming to acquire the lock on resource A. Now both the processes are not complete without accquiring these locks.

But since resource B is locked by the second process, the first process cannot capture a lock on it. Similarly, the first process has acquired a lock on resource A, so the second process cannot acquire the lock on it. These two processes keep fighting for each other’s locks leading to a deadlock situation.

Let’s look at another issue associated with locks.

Row-level locks are the most common and the safest form of locking. But since locks are maintained in the memory of the database, storing them carries a cost. If there are too many locks on row-level and your database is running out of memory, it might escalate the lock to table-level to free up some memory. This can have implications you may not have anticipated!

ACID-compliant databases need to make sure that each transaction is carried out in isolation. That means the results of the transaction are only visible after a commit to the database happens. Other processes should not be aware of what’s going on with the records while the transaction is carried out.

What happens when a transaction tries to read a row updated by another transaction?

It depends on what isolation level the database is operating on wrt that particular transaction. Let’s explore the problems that can occur.

  1. Dirty Read β€” Let’s take a situation where one transaction updates a row or a table but doesn’t commit the changes. If the database lets another transaction read those changes (before it’s committed) then it’s called a dirty read. Why? Let’s say the first transaction rolls back its changes. The other transaction which read the row/table has stale data. This happens in concurrent systems where multiple transactions are going on in parallel. But this can be prevented by the database and we will explore how later.
  2. Non-repeatable read β€” Another side effect of concurrent execution of transactions that is consecutive reads can retrieve different results if you allow another transaction to do updates in between. So if a transaction is querying a row twice, but between the reads, there is another transaction updating the same row, the reads will give different results.
  3. Phantom read β€” In a similar situation as above, if one transaction does two reads of the same query, but another transaction inserts or deletes new rows leading to a change in the number of rows retrieved by the first transaction in its second read, it’s called a Phantom read. This is similar to non-repeatable read. The only difference is that, while in non-repeatable read, there will be inconsistency in the values ​​of a row, in phantom reads, the number of rows retrieved by the queries will be different.

How do databases deal with this?

They implement levels of isolation to avoid such problems.

We discussed locks in the previous issue. While locks were supposed to be used for complete isolation of transactions, if they are going to limit other transactions from doing anything while the resource is locked then lock contentions might create a problem.

The solution to this is having different levels of isolation. Let’s discuss the most common ones. These are mentioned in increasing order of isolation levels.

  1. Read uncommitted β€” This level of isolation lets other transactions read data that was not committed to the database by other transactions. There is no isolation happening here. So if transaction 1 performs an update and before it’s able to commit, if transaction 2 tries to access the updated data, it will see the new data. This doesn’t solve any issues mentioned above.
  2. Read committed β€” This, as the name suggests, lets other transactions only read data that is committed to the database. While this looks like an ideal level of isolation, it only solves the dirty read problem mentioned above. If a transaction is updating a row, and another transaction tries to access it, it won’t be able to. But this can still cause non-repeatable and phantom reads because this applies only to updates and not read queries
  3. Repeatable read β€” To counter the transactions from getting inconsistent data, we need a higher level of isolation and that is offered by repeatable read. In this, the resource is locked throughout the transaction. So if the transaction contains two select queries and in between if another transaction tries to update the same rows, it would be blocked from doing so. This isolation level is not immune to phantom reads, though it helps against non-repeatable reads. This is the default level of isolation in many databases.
  4. Serializable β€” This is the highest level of isolation. In this, all concurrent transactions β€œappear” to be executed serially. Pay attention to how I said it appears to be. That’s because it’s not truly serially or sequentially executed. This level works against phantom reads as well.

This is an interesting feature of Postgres that I came across recently. One of the applications I was working on required to have a distributed locking strategy to control concurrency issues.

The problem was to have a common datastore that contains tasks that multiple, distributed workers need to execute. If one worker picks up a task, the other one shouldn’t be able to pick it up.

To have an application-level control over the locks, where a shared resource is accessed by multiple processes/distributed workers and you need to make sure that only one worker can access it at any point in time, the best option was to have advisory locks .

Advisory locks help an application define a meaning to the lock, ie if you want to acquire a lock while you perform certain tasks in your application you can do that using advisory locks. In the case of row or table-level locks, the code can get ugly with SQL statements mingled with application logic.

The idea is to acquire a lock for a custom ID you generate. Eg, here, for every task, an ID is generated. In a distributed system, if one worker process has already acquired a lock on a task using its ID, the others won’t be able to. Once the business logic is executed the lock is released. Since each task has different lock IDs, the processes can acquire locks simultaneously.

Leave a Comment