Clojure Concurrency Basics

This is a basic introduction into the features that Clojure offers regarding Concurrency Programming. It is not meant to be an extensive reference, but aims to give you an idea of what options you have when you need to coordinate multiple threads and mutable state while programming in Clojure. I hope it gives a bit of basic insights to those who are seeking it 🙂 .

Concurrency Features of Clojure

Clojure comes with an advanced toolset for managing concurrency in your applications. At the same time, it tries to save you from those dreadful, multithreading related bugs and race conditions that are often awfully difficult to reproduce and debug.

Well, let’s have a look at what tools Clojure provides us with:

  • Atoms
  • Refs
  • Agents

…each of which has it’s own use case. But before going into details, I’ll try to create awareness of the involved issues and dimensions when talking about concurrency.

Dissecting Mutable State

Having mutable state may sound negligible as almost every developer today starts learning programming by making use of mutable state. However, if you want to understand the nature of concurrency and why handling it properly is so complex, it is helpful to take one step back and consider immutability the default instead of the other way round. So let’s start off this way.

First of all, concurrency turns out to be no problem when having only immutable data. It’s no problem to reference it from multiple threads as immutable means the data will never change. So we don’t need any kind of mechanism to coordinate access to that data since it is not important in which order threads access that data. Neither is it important to coordinate write access as none exist.

Now what happens when we throw mutability into the mix? That may sound like a small change, but in fact we are adding a whole new dimension of complexity, as suddenly time becomes a relevant factor. It is now important in which order threads access some kind of data structure as it may have a different value a moment from now than it had a moment ago.

But that’s not all of it. We also need to coordinate write access to data structures, meaning that when a thread is changing state of a data structure, no other thread may access (neither read or write) it for as long as the first thread is writing to it.

Traditionally, in “older” languages like C, C++, Java etc. you had to take care of not messing up concurrency issues yourself. Usually, you would use concepts like Locks, Semaphores, Mutexes etc. to coordinate access to shared state. This is not only a lot of overhead on developers’ shoulders, but also very error prone. Worse than that, bugs may turn out to often be so subtle that they only occur once in a while and thus are really hard to track down and maybe even harder to debug, as debugging changes timing in an application and thus may create a Heisenbug.

Clojure to the Rescue

Luckily, Clojure provides some handy tools to avoid getting lost in “concurrency hell”.

First thing to mention is that in Clojure, data is immutable by default. That prevents you from accidentially running into the “concurrency trap”. Immutability is a really good thing as it makes the programmer (you!) aware of the situations that require mutability.

When such a situation crosses your path, the first reaction should be to think about whether you really need mutability there. Maybe you can solve the problem by converting it to a more functional approach that completely avoids mutable state? Clojure offers lots of methods that let you solve most issues without having to go the “mutable state way”.

However, there are situations that really require mutability. If you encounter such a situation, you may choose one of the options that I have mentioned in the beginning.

Option 1: Atoms

Atoms are the suitable choice when your data is referenced by multiple threads and needs to be mutable. When altering the value of an Atom, other threads will see this value to be changed atomically, hence the name. That means no thread will see a value that has been updated partially only. It either sees the old or new state, nothing in between.

Atoms are being created via the atom statement. To access the value that the Atom represents, you use either deref or the syntactic sugar @ :

user> (def x (atom 123))
#'user/x

user> x
#object[clojure.lang.Atom 0x806ce7 {:status :ready, :val 123}]

user> (deref x)
123

user> @x
123

There are two approaches for changing the value of an Atom, reset! will simply overwrite the current value with the given one:

user> (reset! x 315)
315

user> @x
315

swap! is the more functional approach and will call a given function with the current value as first argument. Whatever this function evaluates to will be stored as the new value of the Atom:

user> (swap! x #(inc %))
316

user> @x
316

Optionally, the given function can be called with several more arguments:

user> (swap! x #(+ %1 %2 %3) 10 20)
346

user> @x
346

Be aware that the function passed to swap! might be called more than once (for details, read this for example), in case another thread is accessing your Atom right in the moment when the first thread tries to swap! the value. This in turn means that the function passed to swap! should meet the following requirements:

  • it should not have any side effects – this means it should not do any I/O or alter any other state in your program.
  • it should not take forever to compute – the faster it is the better, as it not only makes your current thread wait, it may also block other threads. And as we’ve learned it might be run multiple times for just one update.

Option 2: Refs

Refs tackle the problem when you need to update more than one variable atomically. You can think of transactions, just like in the database world. Basically, you create Refs, open a transaction, update your Refs and commit a transaction. Other threads will see either none Ref changed or all of them.

You create Refs with – well what a surprise – the ref statement:

user> (def r1 (ref 1))
#'user/r1

user> (def r2 (ref 100))
#'user/r2

user> @r1
1

user> @r2
100

So far, so good. This doesn’t look any different from Atoms yet. However, updating the values is different. For that, you have to explicitly begin a transaction by calling dosync. WIthin the dosync block you can use alter to adjust the values represented by the Refs. The transaction gets committed implicitly when the dosync block is being left:

user> (dosync
 (alter r1 #(inc %))
 (alter r2 #(inc %)))
101

user> @r1
2

user> @r2
101

As you can see, alter is similar to swap!, as it takes a function which is being called with the current value of the specific Ref.

Let’s check whether a different thread really only sees changes atomically:

user> (defn watch
 []
 (doall
   (map #(%) (repeat 10 (fn []
     (Thread/sleep 500)
     (println (str "r1: " @r1 ", r2: " @r2)))))))
#'user/watch

user> (defn change
 []
 (dosync
   (Thread/sleep 1250)
   (alter r1 #(inc %))
   (Thread/sleep 2000)
   (alter r2 #(inc %))))
#'user/change

user> (pcalls watch change)
r1: 2, r2: 101
r1: 2, r2: 101
r1: 2, r2: 101
r1: 2, r2: 101
r1: 2, r2: 101
r1: 2, r2: 101
r1: 3, r2: 102
r1: 3, r2: 102
r1: 3, r2: 102
r1: 3, r2: 102
((nil nil nil nil nil nil nil nil nil nil) 102)

What are we doing here?

First, we define a function named watch, which sleeps 10 times for 500 milliseconds each and in between prints out the current values of Refs r1 and r2.

Second, we define a function named change, which opens a transaction via dosync, sleeps 1.25 seconds, then alters the value of r1, sleeps another 2 seconds and alters the value of r2.

Finally, we let these functions run in parallel, i.e. on different threads via pcalls.

As you can see in the output that follows our pcalls invocation, the “watcher”-thread never sees only r1 changed, although the “change”-thread alters that specific value 2 seconds before touching the value of r2. For the watcher-thread to see the changed values, it needs the dosync-transaction of the changer-thread to finish completely.

Commute

As nice as this kind of transactional behaviour is, it may also lead to performance issues in situations that require a lot of transactional write accesses by many different threads. The problem is that when many threads try writing to a Ref concurrently, only the first wins and can actually make its changes. The other threads will have their transaction operations retried over and over again, until they happen to “be lucky” and get their changes through. As you can imagine, this can escalate quickly, resulting in many blocked threads, all waiting for their transaction to successfully finish.

There is an alternative to alter which tells Clojure that it doesn’t matter in which order threads update a specific Ref (i.e. commutativity). This allows for more efficient parallelism, as it avoids that an entire transaction – maybe consisting of updates to many Refs – has to be entirely retried when an access collision on such a commutative Ref happens. With alter, this leads to the entire transaction to be retried, including all updates to all Refs. With commute, a collision on that very Ref does not retry the transaction, but it is being tolerated that the order in which threads update this Ref can be arbitrary as the end result will not depend on it.

However, you have to decide for yourself whether it is okay to use commute or if you have to rely on alter. Common use cases for when it’s okay to use commute are for example counters, as it does not matter whether Thread A increments the counter before Thread B or the other way round. The end result will always be that counter has been incremented by 2.

So, think wisely if you can use commute and use it whenever possible, but make sure you don’t break things.

Apart from that, the usage of commute is completely equal to that of alter:

user> (dosync
 (commute r1 #(inc %))
 (commute r2 #(inc %)))
103

user> @r1
4

user> @r2
103

Option 3: Agents

Finally there are agents. Agents are appropriate when you need to calculate a new value for a variable but don’t want to wait for the calculation to finish on your current thread. That means with Agents you can have this calculation being done on a dedicated thread “in the background” if you want to call it that way. However, if you deref the Agent before that calculation is done, you will still see the old value.

Let’s have an example:

user> (def ag (agent 100))
#'user/ag

user> @ag
100

user> (send-off ag (fn [v] (Thread/sleep 3000) (inc v)))
#object[clojure.lang.Agent 0x16fe925 {:status :ready, :val 100}]

user> @ag
100

user> @ag
101

As you can see, Agents are created with the agent statement and can be updated with the send-off statement (there is also a send statement. The two differ in how they handle threads). As you can also see, the first deref of ag returned still the old value, but the second one returned the new value as the send-off function had been done with it’s “calculation”.

It should also be mentioned that – contrary to manipulation of Atoms & Refs – there is no special operator for setting the new value of the Agent. Instead, the new value is the value that the function passed to send-off (the so-called action) evaluates to.

What does this have to do with concurrency? Well, each Agent has it’s own action queue. When several threads send an action to it, these actions will be queued and processed in order. Very handy for serializing access to a resource for example. Do I have to say more? I’m sure you can think of many ways how you can benefit from Agents.

More details

There is still a lot of things to discover regarding Atoms, Refs and Agents. If you want to more into detail, you might want to refer to the following sources:

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*