Basic Event Sourcing in Clojure

2017-07-17

Today I am going to talk about basic event sourcing without all the buzz and fuzz. You don’t need Kafka, opaque containers, external providers, and/or a fancy distributed, fault-tolerant cluster setup to do it.

In Clojure and Clojurescript, it’s very commmon to store app state in a single atom. What do you do when you want persistence? The default is to start integrating with some SQL database. This requires additional software, you have to setup schemas and the data model works is different from the native Clojure one.

What else can you do? For things that run on a single server, we can use files. Files are great and are generally underestimated. Hacker News runs on files, for example. One approach is to persist the entire atom to file whenever we change it. It might look something like this:

Basic atom db persistence

;; initial db
(def db (atom {"foo@bar.com" {:cookies 1}}))

;; update db
(swap! db #(assoc-in % ["foo@bar.com" :cookies] 5))

;; persist db to file
(spit "app.db" (prn-str @db))

We would probably want to do the swap! and spit in some form of transaction function. We can verify that the db has been persisted correctly:

$ cat app.db
{"foo@bar.com" {:cookies 5}}

On startup we simply run something like:

;; load db into memory
(reset! db (read-string (slurp "app.db")))

One thing that is missing so far is atomicity, ensuring that the file doesn’t get corrupted. We can solve this by writing to a temporary file and then renaming it once we know the data is on disk. See Brandon Bloom’s post Slurp and Spit.

Event sourcing

Event sourcing simply means that the source of truth consists of a series of events. We apply these events using some function that creates an aggregate view of what we are interested in. In Clojure code:

;; two events
(def e0 [:add-cookie {:email "foo@bar.com" :cookies 1}])
(def e1 [:add-cookie {:email "foo@bar.com" :cookies 5}])

;; state transition for adding cookies
(defn add-cookie [state {:keys [email cookies]}]
  (assoc-in state [email :cookies] cookies))

;; how we do state transitions, f: s0 => s1
(defn next-state [state [type data]]
  (cond
    (= type :add-cookie) (add-cookie state data)
    :else                 state))

;; update in-memory db as events happen
(swap! db #(next-state % e0))

;; also persist events to disk
(spit "events.db" (prn-str e0) :append true)
(spit "events.db" (prn-str e1) :append true)

;; on startup, load and aggregate events into in-memory db
(reset! db (reduce next-state {} [e0 e1]))

We can now easily see the history of a user’s cookie count using grep:

$ cat events.db | grep foo@bar.com
[:add-cookie {:email "foo@bar.com", :cookies 1}]
[:add-cookie {:email "foo@bar.com", :cookies 5}]

This information would’ve been lost if we just kept mutating the database. Another feature this enables is is that we can change our schema easily. Let’s say we want to put all the user data in a :users key. All we have to do is change add-cookie:

(defn add-cookie [state {:keys [email cookies]}]
  (assoc-in state [:users email :cookies] cookies))

and re-build our aggregate state. We don’t risk losing any data because we never touch our real source of truth, events. In fact, you can have multiple aggregate states for multiple purposes, if you so wish.

Trade-offs

With this approach you have to be careful with how you define your events so you can always read them. This is easier if you define schemas and write your new state transition functions can still read old events, for example with default values.
Writing the state transition functions can get hairy and requires some discipline. Threading functions for state is your friend.
There are some edge cases to take care of when it comes to updating events and in-memory database at the same time that you have to think about, so you don’t get an inconsistent state. Validate your data.
Not being in standard SQL makes it harder to use external tools.
Configure your own backups. Unlike, say, RDS you have to actually keep your own backups.
Doesn’t work for multiple servers. You can write your own event server and do it that way, but it gets hairy quickly. This is when you might want to look into something more elaborate.
Not great for a lot of data. If you can’t keep your data in memory you are going to run into issues. Hint: if you haven’t tried, your data most likely fits in RAM.
Slow startup time. If you have a lot of events, loading the data might be slow. This can be solved by snapshotting state.

On the other hand, you get something quick to work with, you keep the entire history, you don’t need additional integrations, you can easily grep your data, and it’s very flexible.