Wednesday 2 October 2013

Java ConcurrentHashMap demystified

This post is in continuation of my previous post on Java HashMap. You may want to go through that post if you are not sure of Java HashMap internals.

As mentioned in the previous post, Java HashMap is not thread safe. If you want to use HashMap in a multi-threaded environment, you might look to HashTable or ConcurrentHashMap. HashTable has a bottleneck that it uses map wide lock which greatly slows down the application. ConcurrentHashMap uses several tricks to achieve a high level of concurrency and avoid locking, including using multiple write locks for hash buckets. It also exploits the uncertainties of the JMM to minimize the time that a lock is held or avoid locking at all.

Please note that the major barrier for HashTable is that it uses single map wide lock that is held for the entirety of insertion, retrieval, removal or traversal. ConcurrentHashMap uses a collection of 32 locks each of which guards a subset of hash buckets. This means that maximum of 32 threads can modify the map at one time. 32 is the theoretical concurrency limit and may not be achievable in practice. So, if 31 threads are writing to the map, it is very much possible that 32nd thread trying to modify the map will block.

Retrieval :

Unlike HashTable, ConcurrentHashMap may not acquire a lock while retrieving a value from the map. A very important point to note here is that synchronization offers much more than atomicity. It also guarantees ordering. Lets see how... Java Language Specification permits some memory operations not be made visible to other threads instantly because of the performance benefits of using per processor caches. Since ConcurrentHashMap may not acquire a lock during retrieval, it should be prepared to handle stale values. Instead of locking, the LinkedList used at any bucket location in ConcurrentHashMap is designed such that the implementation can detect when its view of the list has become inconsistent or stale. If it finds so, it acquires the appropriate lock on the bucket and searches again. This is one of the reasons why  ConcurrentHashMap performs badly in single threaded environment. This approach optimizes the lookup in the common case - where most retrievals are successful and retrievals outnumber insertions and removals.

Removal :

Removal operation on ConcurrentHashMap holds the bucket lock for the duration of its execution. To understand the removal, we need to know the data structure of an Entry object in ConcurrentHashMap. So, here it is :
Please note that all elements except the value are final, whereas value is volatile. I assume that the reader is aware of what a volatile variable means in Java. Since the next variable in Entry object is final, we can't add or remove an element to the LInkedList at a bucket location. So, how does removal work ??? When an entry object is removed from the LinkedList, its value field is assigned the value null. Since value field is volatile, it changed value is immediately visible to all threads. Once the value field is assigned the value null, the portion of the LinkedList from head to the removed element is cloned and joined to the remainder of the LinkedList following the removed element. So, if another thread is looking for the removed element in the stale chain, it would see null value and would retry with lock which would ensure that the chain is upto date. Eventually the discarded portion of the stale chain would be garbage collected.

Insertion :

A new Entry object is always added to the head of the LinkedList at a bucket. The same logic would apply as explained in Removal section. If another thread is trying to retrieve the inserted value in a stale chain, it would retry with synchronization since the inserted value won't be present in the stale chain.

Update :

Like removal operation, update operation on ConcurrentHashMap also holds the bucket lock for the duration of its execution.  This doesn't necessarily block the reader from executing since the retrieve operation doesn't always need a lock. Update operation just looks for the appropriate Entry object in the chain and updates its value field. Since the value field is volatile, its changed value is available to all the threads at once.

3 comments:

  1. Since Java ConcurrentHashMap implementation creates a new Entry object for each insertion, won't it create a lot of garbage ?

    ReplyDelete
    Replies
    1. Yes it does create a lot of garbage. Pooling won't help in this case because Entry objects are internal to hash map and are not visible to the clients. But there are ways to counter this. I can't provide explanation on this due to copywrite issues of my employer.

      Delete
    2. Yeah I do understand.

      By the way, this was a nice article.

      Delete