Distributed locks using Redis

How to create reliable locks for a distributed architecture

Distributed locking with Redis at GoSquared

Locking is a very common concept in programming. Wikipedia’s definition describes it pretty accurately:

In computer science, a lock is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution. A lock is designed to enforce a mutual exclusion concurrency control policy.

Simply put, a lock is a single point of reference for multiple threads to check whether or not they are allowed to access a resource. So if, for example, a thread wants to write data somewhere, it must first check if a write lock already exists. If the write lock exists it must wait until the lock is released before it can obtain its own lock and perform its write. This way, the lock prevents multiple threads from writing simultaneously which might otherwise lead to adverse effects like corruption.

Modern operating systems have built-in functionality to help you with concurrency control, such as flock. But what if you are running multiple threads of your program on more than one machine? What if you need to control access to a resource across a distributed system?

Use a central lock server

First, we need something accessible from all our threads to store our lock. The lock can only exist in one place, to ensure there is only one authoritative place that defines whether or not the lock is set.

Redis is an ideal candidate for this. As a lightweight in-memory database, it is fast, transactional and consistent, which are the key qualities we require for a locking system.

Designing the lock

The lock itself is easy. It’s simply a key in a Redis database. Setting and unsetting the lock and making it crash safe is where it gets trickier. Let’s list some of the potential pitfalls:

  1. Your program interacts with Redis over a network, which means there is latency between a command being issued by your program and it being run by the Redis database. During this time, Redis will be running other commands, and the state of the data in Redis could diverge from what your program expects. How does one thread of your program establish a lock only once without clashing with other threads?
  2. What if your program crashes immediately after it sets the lock, and never unsets it? The lock might stay in place indefinitely and you end up with a deadlock.

Setting the lock

You might be tempted to say, “ah, this is all very straightforward. We can check for the lock using GET, and if it doesn’t exist, set it using SET!”.

That would be a simple method, but it does not guarantee an exclusive lock. Recall pitfall #1. Because there is latency between our GET and SET commands, we have no way of knowing if another thread managed to set the lock during the time it took our commands to reach and return from the Redis server. Sure, this is down to a matter of milliseconds and may have a fairly low chance of happening, but in a busy environment running lots of concurrent threads and commands, the likelihood of overlapping is not negligible.

To help counter this, we should use SETNX instead. SETNX eliminates the need for a preceding GET round-trip because it will succeed only if the key doesn’t already exist when the command runs. So that means only one thread will be able to run a successful SETNX while the others will fail and will need to retry until they establish the lock.

Unsetting the lock

Once your thread has run a successful SETNX command, it has established the lock and can do its work with the resource. After this work is completed, it should release the lock by deleting the redis key, allowing other threads to establish the lock as soon as possible.

However, beware! Here lies pitfall #2. If there’s a crash in the thread, it never deletes the Redis key so the lock remains in place indefinitely and you get a deadlock. How do we prevent this?

Lock TTL

We can impose a Time To Live (TTL) on the lock key so that it is automatically deleted by Redis if the TTL expires. Any locks that are left established by faulty threads will then release after a suitable timeout, protecting against deadlocks. This is purely a safety feature, however, and it is still far more efficient to ensure your threads release the lock as they should.

Setting a TTL on a key in Redis can be done using the PEXPIRE command, and can be issued by your thread immediately after the SETNX in a MULTI/EXEC transaction, like this:


MULTI
SETNX lock-key
PEXPIRE 10000 lock-key
EXEC

However, this introduces another problem. The PEXPIRE command is unaware of the result of the SETNX command, and sets the TTL of the key regardless. If we’ve got a deadlock in place, and multiple threads keep updating the TTL of the key at high frequency each time they want to establish a lock, then they will be perpetually extending the TTL of the key and it will never actually expire. To resolve this issue, we need Redis to handle this logic in a single Redis command. We can achieve this with Redis scripting.

Note – this is also possible without a script by using the additional PX and NX arguments for SET in Redis versions >= 2.6.12, but we’re using a script instead because it’s compatible back to 2.6.0.

Redis script

With Redis scripting, you can write Lua scripts that run multiple Redis commands inside the Redis server itself. Your script is cached in the Redis server and run by your program with a single EVALSHA command. The power here is your program only has to run a single command (the script) to run multiple redis commands in a transactional way that is immune to concurrency clashes since only one redis script can run at a time.

Here’s a Lua script to set the lock with a TTL in Redis:


--
-- Set a lock
--
-- KEYS[1]   - key
-- KEYS[2]   - ttl in ms
-- KEYS[3]   - lock content
local key     = KEYS[1]
local ttl     = KEYS[2]
local content = KEYS[3]
 
local lockSet = redis.call('setnx', key, content)
 
if lockSet == 1 then
  redis.call('pexpire', key, ttl)
end
 
return lockSet

It’s pretty clear to see from this script that we solve the unending TTL issue by only running PEXPIRE on a lock that didn’t previously exist.

Warlock: Battle-hardened distributed locking using Redis

Now that we’ve covered the theory of Redis-backed locking, here’s your reward for following along: an open source module! It’s called Warlock, it’s written in Node.js and it’s available on npm. It takes care of all of the plumbing involved in using Redis scripts as well as setting / unsetting the locks. We use it in our own projects at GoSquared to help us with distributed locking around caching, databases, job queues and other aspects sensitive to concurrency. Check out the README for usage instructions.

Never miss a post

  • VaidikKapoor

    Nice post. I have done something similar in Python (https://github.com/vaidik/sherlock/) but you can configure it to use either Redis, Etcd, Memcached. Although it supports multiple backends so that you can tune it to your needs, Redis so far fits the best because of the guarantees you get using Lua scripts.

    Etcd is something that I am sure can offer more guarantees and better performance for acquiring locks. But my library has not been able to use it that way. That’s on the road map.

    BTW, have you read this post by Antirez on implementing better distributed locks: http://antirez.com/news/77

    • Thanks!

      I have not looked into Etcd much, looks interesting though.

      Salvatore’s post on locks is a must read, I highly recommend it to anyone interested in locks with Redis.

      • VaidikKapoor

        I wanted to ask if you guys have experienced any issues with the approach you have shared in your post. Are there any gotchas or races you run into from time to time?

        • It’s working very reliably. But there are naturally some limitations of using a single redis server for centralised locks. For one, it’s a SPOF so if that goes down your locks go down. Secondly, you might worry that at some point you’ll exceed the capacity of a single server. However, since Redis is capable of handling in the region of 30,000+ ops/second this ceiling is high enough for the vast majority of use cases.

          • VaidikKapoor

            True. And 30K ops/sec are good enough. For SPOF, you can use Redis in Master-Slave replication and have automatic switch over so that there is no SPOF.

          • EJ

            During failover, you will have a split brain situation where two nodes can own the same key, so it is not as simple as “master-slave” if robustness is needed.

          • VaidikKapoor

            Yes. You are right. In that case, actually Redis does not seem to make a very good candidate for distributed locks. It actually depends on your use-case. But since we are talking about distributed mutexes, consistency becomes essential.

  • Ted Bigham

    FWIW, you don’t need a LUA script or MULTI to set the lock. Just check the return code from SETNX. If it’s 1 call PEXPIRE. If it’s 0, you didn’t get the lock and need to fail/retry.

    Also, releasing the lock is not as simple as deleting the key. You need to check that you actually own it first, since it could have expired and gotten re-locked by someone else. This check+delete step may require a LUA script.

    • You’re right there, Ted!

      Our locking module (Warlock) now uses SET with the extra options (NX and EX) to make it atomic without needing a multi or a LUA script.

      Releasing the lock uses a LUA script to check that this process still owns the lock.

      • Flemming Jans

        What if the client crashes between SET and PEXPIRE? You will have the key locked forever I guess.
        I would go for the LUA script which is also a single round-trip instead of two.

        • SET is called with the PX option which also sets the TTL of the key in the same command.

          • Flemming Jans

            Ofcourse, thx!

  • chaitanya

    download latest version of WhatsApp for video calling. WhatsApp released latest version WhatsApp 2.16.318.
    This version supports whatsapp vedio calling. download and
    activate it