Robert has posted 1 posts at DZone. View Full User Profile

Getting to Know Immutable Data Structures - Immutability and Concurrency – Part I

05.20.2008
| 4131 views |
  • submit to reddit

When asking the question how does functional programming help me with concurrent programming? The standard response tends to be functional programming use immutable data structures, read-only data structures can be shared between threads without issues, end of problem. Except it isn’t. Immutable data structures have a different set of problems associated with them when working on concurrent problems. This post will examine what these problems are, and then show that this is just a special case of a more general set of problems when working with immutable data structures. Finally will start taking a look at how we solve some of these problems, but in a single thread environment first of all.

First let’s frame the problem by looking at how imperative programs work with threads. In a classic imperative/OO languages programmers tend to use either instance or static member variables to send messages between threads, let’s look a fragment of C# that does something classically multi-threaded:

object workQueueLock = new object();

Queue<WorkItem> workQueue = new Queue<WorkItem>();



// this method runs on its own thread

private void Worker() {

while (true) {

// define a work item attempt to retrive work from the queue

WorkItem item = null;

lock (workQueueLock) {

if (workQueue.Count > 0) {

item = workQueue.Dequeue();

}

}



// check if we have work to do, otherwise sleep

if (item != null) {

// do some work

}

else {

Thread.Sleep(QueuePollInterval);

}

}

}



// an even that resets our flag

private void Some_Event(object sender, WorkEventArgs ea) {

lock (workQueueLock) {

workQueue.Enqueue(ea.WorkItem);

}

}

I wouldn’t recommend you use this naive version of a work queue, but the above code is straight forward enough to understand easily and illustrate the typical way imperative programs communicate between threads. We have a member variable “workQueue” that controls stores the work to be done, the method “Worker” is designed to read from this queue and if there’s some work to do, do the work, otherwise sleep till it’s time to poll the queue again. We use “workQueue” again in “Some_event” to send a message to “Worker”, to enqueue some work for it to do. It’s easy to see that mutation of the variable “workQueue” is essential to get this to work; if we couldn’t change the content of “workQueue” then we couldn’t send the message. It’s also easy to see that we now have a huge number of implementation choices: Do we lock on the queue, or have separate lock? What’s the shortest possible time we can hold the lock for (to avoid other threads be blocked when they want to write to the queue)? How long do we sleep for before polling the queue? Too shorter time and we risk wasting too much processor time polling the queue, too longer time and risk that the queue because unreactive because the worker wastes too much time sleeping when there’s work to be done.

In pure functional programming there are no variables or mutation, so the above scenario simply isn’t possible. Sure, F# isn’t a pure function language, actually most functional languages aren’t, so you can indeed use mutable data structures to implement something similar to the C# fragment we showed earlier, but that’s not the point we want to learn how to use immutable data structures. To fully understand the limitations of immutable data structures, let’s look at another C# example do something simpler. Imagine that we want compute a key of a value then store it in a member variable, a dictionary in this case, for later use:

Dictionary<string, string> myDict = new Dictionary<string, string>();



public void ReceiveValue(string val) {

myDict.Add(ComputerKey(val), val);

}

Now let’s think about how we can translate this into F#. Firstly, if don’t mind being dirty and mutable we can translate this fragment verbatim:

type Store() =

let myDict = new Dictionary<string, string>()

member x.ReceiveValue (value:string) =

myDict.Add(x.ComputeKey value, value)

However, if we don’t want to be mutable it’s not quite so straight forward. F# contains a type called “Map”, which is very similar to a Dictionary except that it is immutable. When you add a new item to a map you don’t change the map you create a new version of the map with the new key added. So here is how our store class would look to if we used an immutable “Map” data structure:

type ComputeKeys(myDict:Map<string, string>) =

member x.ReceiveValue (value:string) =

new ComputeKeys(myDict.Add(x.ComputeKey value, value))

The important thing to notice is that we now have no “let” definition where we store our dictionary; instead the dictionary is passed to the class constructor. So our constructor receives a “Map”, and when we use our “ReceiveValue” method we create a new instance of the “ComputerKeys” which contains the newly created value. I think the type signature really helps us understand what’s going on:

type ComputeKeys = class

end

with

member ReceiveValue : value:string -> ComputeKeys

new : myDict:Map<string,string> -> ComputeKeys

end

This is pretty much the revelation of immutable data structures, “let” definitions become merely short conveniences for values, not memory location that can be updated at a later data if we want to. These new values are all held on the threads stack, if were being pure and fully immutable that we have no memory locations that we can write them to. Okay let’s have a look at how we might use these two classes:

/// wraps a Dictionary<string, string> to provide

/// some hashing and printing functions

type Store() =

// the dictionary that stores the values

let myDict = new Dictionary<string, string>()

/// receive a value, hash it store it

member x.ReceiveValue (value:string) =

myDict.Add(x.ComputeKey value, value)

/// computers the hash (a bit naff for now)

member x.ComputeKey (value:string) =

value.GetHashCode().ToString()

/// prints the stored values

override x.ToString() =

let stringWriter = new StringWriter()

for key in myDict.Keys do

stringWriter.WriteLine("{0}: {1}", key, myDict.[key])

stringWriter.ToString()



let useStore() =

let store = new Store()

store.ReceiveValue("One")

store.ReceiveValue("Two")

store.ReceiveValue("Three")

printfn "%s" (store.ToString())

The mutable version needs little explanation, it is classical imperative programming, we create an instance of store then add values to our store, and finally we print them out. Now compare this with the immutable version:

/// wraps a Map<string, string> to provide

/// some hashing and printing functions

type ComputeKeys(myDict:Map<string, string>) =

/// receive a value, hash it, return the new value

member x.ReceiveValue (value:string) =

new ComputeKeys(myDict.Add(x.ComputeKey value, value))

/// computers the hash (a bit naff for now)

member x.ComputeKey (value:string) =

value.GetHashCode().ToString()

/// prints values in the map

override x.ToString() =

myDict.Fold

(fun key value acc ->

Printf.sprintf "%s \r\n%s: %s" acc key value)

""



let useComputeKeys() =

let keysEmpty = new ComputeKeys(Map.empty)

let keysOne = keysEmpty.ReceiveValue("One")

let keysTwo = keysOne.ReceiveValue("Two")

let keysThree = keysTwo.ReceiveValue("Three")

printfn "%s" (keysThree.ToString())

The thing to notice here is how similar using the immutable ComputeKeys class is to using the Store class. We create an instance of the class, we add values to it, and then finally we print it. The only difference being that we need to catch the value returned from RecieveValue and use this value in the next step. Here we’ve used different names for each instance – to illustrate that each let binding is to a different instances, but we don’t need to do that we can reuse the same name to save inventing new names:

let useComputeKeysAlt() =

let keys = new ComputeKeys(Map.empty)

let keys = keys.ReceiveValue("One")

let keys = keys.ReceiveValue("Two")

let keys = keys.ReceiveValue("Three")

printfn "%s" (keys.ToString())

The take away from this is that programming with immutable data structures when we have one thread of execution is not that different to programming with imperative mutable structures, we just have to remember that every time we want to make a change we copy and add rather than update.

Wrapping It Up

In this inductor post we’ve looked at why mutation is important to classical concurrent programming, and indeed classical imperative programming. Then we looked at immutable data structures and compared they way that they work to mutable data structures. In the next post we’ll dig deeper into immutable data structures, to really get a feel for the programming possibilities they offer. Then in the post after that we’ll look at concurrent programming with immutable data structures and finally get to grips the problem we posed ourselves in the first couple of paragraphs of this post. Patience is a virtue and good things come to those who wait J.
Published at DZone with permission of its author, Robert Pickering.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)