Sunday, 1 December 2013

Hyper threading and CPU core binding

When we touch the threshold to increase the performance through software, we go back to the hardware. With a fixed set of machines and cores, the only thing we can exploit is hyper-threading, and binding a specific process to a particular core and not allowing any other process to execute on that core. We often encounter this scenario in low latency trading systems domain where even micro seconds latency is not acceptable.

The video in the following link explains hyper threading in very simple terms.

Here is a nice article to get a sense of how to bind a process to a CPU core :


Saturday, 26 October 2013

Posix Threads

In this post I would demonstrate the usage of Posix threads through a very simple problem. I would also demonstrate how we can synchronize Posix threads using mutex. Though there are many synchronizing primitives available, I would stick to mutex for simplicity. Please note that the code presented here is only for demonstration purpose and is inefficient.

Before starting on the problem, let me give a brief introduction of Posix threads. Historically, hardware vendors developed their own proprietary implementations of threads. These implementations differed substantially from each other. So, it was very difficult to develop multi-threaded portable applications. So, a standard specification was required to standardize the thread interface and this interface is Posix thread.

Problem :

A string is to be processed by a given number of threads. The processing in this case would be printing of the string. Each thread should do specific amount of work before passing control to the next thread. The work in our case is the printing of characters. The control is to be passed to the next thread in a round robin manner till the string is processed completely.

Program should take 3 command line arguments.
<Program name> <Input String> <Amount of work> <Number of threads>
Here is the desired output for a sample run.
ankit@ankit:~/problems/p1$ ./driver ABCDEFGHILKLMN 2 3
Thread 1 : AB
Thread 2 : CD
Thread 3 : EF
Thread 1 : GH
Thread 2 : IL
Thread 3 : KL
Thread 1 : MN
ankit@ankit:~/problems/p1$

Note that each thread is getting control in a round robin manner. In the beginning, thread 1 printed "AB" since the amount of work specified is 2. Then thread 2 printed "CD", then thread 3 printed "EF". After this, control again came to thread 1.


This program should be complied using -lpthread option. eg: gcc driver.c -lpthread -o driver

This problem throws light on many basic aspects of Posix threads like thread creation, thread synchronization. Ideally the sleeping part in the program is not efficient. Instead we should wait the thread and signal when its turn come. But this is for demonstration purpose.

Sunday, 20 October 2013

Subset Problems

In this post, I would discuss some basic programming problems on sets.

Problem Statement:

Given a list of numbers, find all the possible subsets of the list of numbers. For example, if the set of numbers given is {1,2,3}, the output should be {},{1},{2},{3},{1,2},{1,3},{2,3},{1,2,3}.

Solution:

The first solution is recursive is pretty straight forward. Please follow the Java code given below.

The second solution is iterative and is more efficient than the first solution. This solution works based on the principle that for a set of n elements, there would 2^n subsets. The binary representation of an integer i in the range 0 to (2^n - 1) is used to find the corresponding ith subset. For example, if there are 3 elements in the set, we enumerate the binary representation of all the numbers from 0 to 7. Each binary representation represents a subset. The set bit at ith index represents that take ith element from the set. Here are those :
000 -> Empty set
001 -> Only take 1st element
010 -> Only take 2nd element
011 -> Take 1st and 2nd element
100 -> Only take 3rd element
101 -> Take 1st and 3rd element
110 -> Take 2nd and 3rd element
111 -> Take all elements



If we tweek the above problem statement a little. Here is the new problem statement:

Problem Statement:

Given a list of numbers from 0-9, find the greatest number that can be formed from these numbers such that the number is divisible by three

Solution:

We can use the recursive approach above. One more property is needed : If a number is divisible by 3, then the sum of its digits is also divisible by three. Since we have to find the greatest number, we can start with the greatest number in the recursive solution above. Also the list need to be reverse  sorted so that the smallest number is removed first. Here is the code segment :

Sunday, 6 October 2013

Finding Overlapping Rectangles in a given set of Axis aligned rectagles

Describe an algorithm that takes an unsorted array of axis-aligned rectangles and returns any pair of rectangles that overlaps, if there is such a pair. Axis-aligned means that all the rectangle sides are either parallel or perpendicular to the x- and y-axis. You can assume that each rectangle object has two variables in it: the x-y coordinates of the upper-left corner and the bottom-right corner.

Solution :

We can use Sweep Line algorithm for this. Create a sorted array of the x-coordinates of the left and the right edges of all the rectangles. The x-coordinates should be annotated with the rectangles they belong and whether they are the left or the right edge of the rectangle. Now, scan the sorted list from left to right, adding each rectangle to a balanced BST when you encounter the left edge of a rectangle and removing the rectangle from the balanced BST when you encounter the right edge of the rectangle. The solution requires the usage of one dimensional range search tree. A range search tree is a tree which is capable of performing range search queries efficiently. The time complexity of a range search query in a range search tree is O(R + logN), where N is the number of nodes in the tree and R is the number of elements in the result set.

First lets see how a search query works in a one dimensional range search tree:
1DRangeSearchTree(k1, k2, v):
    Input : Keys k1 and k2, and a node v of a binary search tree T
    Output : The elements whose keys are greater than or equal to k1 and less than or equal to k2.
    Algorithm :
        if key(v) >= k1 and key(v) <= k2
            ElementsInLeftSubTree = 1DRangeSearchTree(k1, k2, T.left(v))
            ElementsInRightSubTree = 1DRangeSearchTree(k1, k2, T.right(v))
            return ElementsInLeftSubTree U ElementsInRightSubTree  U {element(v)}
        else if key(v) < k1
            return 1DRangeSearchTree(k1, k2, T.right(v))
        else if key(v) > k2
            return 1DRangeSearchTree(k1, k2, T.left(v))

Now lets see the detailed algorithm of the original problem:
1) Sort all left and right rectange edges, according to their X value, into list L.
2) Create a new empty range search tree T, for Y ordering of rectangle tops/bottoms
3) Create a new empty result set RS of unique rectangle pairs
4) Traverse L in ascending order. For all V in L:
   If V.isRightEdge()
      T.remove(V.rectangle.top)
      T.remove(V.rectangle.bottom)
   else
       For all U in T.getRange(V.rectangle.top, V.rectangle.bottom)
         RS.add(<V.rectangle, U.rectangle>)
      T.add(V.rectangle.top)
      T.add(V.rectangle.bottom)
5) return RS

Time complexity is O(R + N log N) where R is the actual number of intersections.

Please note that this algorithm does not consider two rectangles as overlapping if one rectangle is completely within another rectangle. However this can be corrected with a minor tweek in the search algorithm for 1DRangeSearchTree.

Reference : http://stackoverflow.com/questions/3324880/axisaligned-rectangles-intersection

Wednesday, 2 October 2013

Java ConcurrentHashMap demystified

This post is in continuation of my previous post on Java HashMap. You may want to go through that post if you are not sure of Java HashMap internals.

As mentioned in the previous post, Java HashMap is not thread safe. If you want to use HashMap in a multi-threaded environment, you might look to HashTable or ConcurrentHashMap. HashTable has a bottleneck that it uses map wide lock which greatly slows down the application. ConcurrentHashMap uses several tricks to achieve a high level of concurrency and avoid locking, including using multiple write locks for hash buckets. It also exploits the uncertainties of the JMM to minimize the time that a lock is held or avoid locking at all.

Please note that the major barrier for HashTable is that it uses single map wide lock that is held for the entirety of insertion, retrieval, removal or traversal. ConcurrentHashMap uses a collection of 32 locks each of which guards a subset of hash buckets. This means that maximum of 32 threads can modify the map at one time. 32 is the theoretical concurrency limit and may not be achievable in practice. So, if 31 threads are writing to the map, it is very much possible that 32nd thread trying to modify the map will block.

Retrieval :

Unlike HashTable, ConcurrentHashMap may not acquire a lock while retrieving a value from the map. A very important point to note here is that synchronization offers much more than atomicity. It also guarantees ordering. Lets see how... Java Language Specification permits some memory operations not be made visible to other threads instantly because of the performance benefits of using per processor caches. Since ConcurrentHashMap may not acquire a lock during retrieval, it should be prepared to handle stale values. Instead of locking, the LinkedList used at any bucket location in ConcurrentHashMap is designed such that the implementation can detect when its view of the list has become inconsistent or stale. If it finds so, it acquires the appropriate lock on the bucket and searches again. This is one of the reasons why  ConcurrentHashMap performs badly in single threaded environment. This approach optimizes the lookup in the common case - where most retrievals are successful and retrievals outnumber insertions and removals.

Removal :

Removal operation on ConcurrentHashMap holds the bucket lock for the duration of its execution. To understand the removal, we need to know the data structure of an Entry object in ConcurrentHashMap. So, here it is :
Please note that all elements except the value are final, whereas value is volatile. I assume that the reader is aware of what a volatile variable means in Java. Since the next variable in Entry object is final, we can't add or remove an element to the LInkedList at a bucket location. So, how does removal work ??? When an entry object is removed from the LinkedList, its value field is assigned the value null. Since value field is volatile, it changed value is immediately visible to all threads. Once the value field is assigned the value null, the portion of the LinkedList from head to the removed element is cloned and joined to the remainder of the LinkedList following the removed element. So, if another thread is looking for the removed element in the stale chain, it would see null value and would retry with lock which would ensure that the chain is upto date. Eventually the discarded portion of the stale chain would be garbage collected.

Insertion :

A new Entry object is always added to the head of the LinkedList at a bucket. The same logic would apply as explained in Removal section. If another thread is trying to retrieve the inserted value in a stale chain, it would retry with synchronization since the inserted value won't be present in the stale chain.

Update :

Like removal operation, update operation on ConcurrentHashMap also holds the bucket lock for the duration of its execution.  This doesn't necessarily block the reader from executing since the retrieve operation doesn't always need a lock. Update operation just looks for the appropriate Entry object in the chain and updates its value field. Since the value field is volatile, its changed value is available to all the threads at once.

Sunday, 29 September 2013

Java Hash Maps demystified

Often new programmers use Hash Maps in Java without really knowing how they work. Knowing how Java Hash Maps work would help them to use them judiciously.

How an entry is added into a HashMap :

Lets begin with what happens when we add an entry to a hash map. For that, we use put() method of HashMap class. This method takes a key and a value. When we pass Key and Value object to put() method of Java HashMap, HashMap implementation calls hashCode method on Key object and applies returned hashCode into its hashing function. The hashing function returns a bucket location for this entry. I say entry because it comprises of both Key and Value. An important point to remember is that HashMap in Java stores both Key and Value object. Why it does so, would be clear shortly. Once it gets the bucket location, it adds the entry object to the bucket. A bucket location uses LinkedList to store various entry objects. When we add an entry object to a bucket, it is added at the head of the LinkedList to avoid tail traversing.

Collision resolution :

Java HashMap stores both Key and Value object to resolve collisions. If two Key objects have the same hashCode, they would have the same bucket location because given a hashCode, the hashing function would return the same bucket location every time. At a given bucket location Java HashMap uses LinkedList to store Entry objects (Entry object comprises of Key and Value objects). An important point to note is that if two objects have the same hashCode, they would go into the same bucket location. Since we use LinkedList to store values at a bucket location, having a long LinkedList can defeat the purpose of HashMap. So, we should judiciously implement hashCode() method of the Key object.

How a value is retrieved from a HashMap :

Now let's see how Java HashMap retrieves a Value object. The method provided for this purpose is get() and it takes a Key object as an argument. When we call get() method, the HashMap calculates hashCode of the Key object by calling hashCode method of the Key object. Once it gets the hashCode of the Key object, it calculates the bucket location of the Entry object by using hashing function. Since HashMap stores a LinkedList of Entries at a given bucket location, it uses key.equals() method to identify the correct node in the LinkedList. This explains the reason why we store both Key and Value object.

Rehashing :

What happens when the size of the HashMap exceeds a given threshold ? If it crosses a given threshold known as load-factor (which is 0.75), it resizes itself. Similar to other collection classes like ArrayList, Java HashMap resizes it self by creating a new bucket of size twice the previous size and then starts putting every old element into that new bucket array. This process is called rehashing since it applies hash function to find new bucket location. But wait !!! If you think of rehashing for a while, you would see big problem. There is a potential race condition. What if two threads find that HashMap needs resizing at the same time. In the process of resizing the HashMap, the element in bucket which is stored in LinkedList gets reversed in order because HashMap appends the new element at the head instead of the tail to avoid tail traversing. If race condition happens, you would end up in an infinite loop. That is the reason, Java HashMap is not thread safe. In such scenarios, ConcurrentHashMap should be used. There are more data structures and ways to obtain concurrency apart from ConcurrentHashMap.

Good Hash Map Keys :

Having good keys for a HashMap is very important. If the time taken to generate a hashCode is high, HashMap would perform very badly since every insertion calls hashCode method. Please note that retrieval does not compute hashCode since JVM calculates hashCode of each Key and provides it on demand. Also, if all the objects are mapped to the same bucket, it just as a LinkedList where retrieval would be very expensive. Now let's have a look at one more aspect of a good hash key. When we modify an object's state, JVM sets a flag which indicates that the object is modified and hashCode umust be recomputed. So, the next time when we call hashCode() method of that object,hashCode is recomputed. For this basic reasoning, it recommended to have immutable keys.

Very broadly, a good HashMap key should have the following traits :
1) hashCode() and equals() method of Key object should be fast.
2) hashCode should be well distributed to minimize hash collisions.
3) Key class should be immutable.

Saturday, 21 September 2013

Java String comparison using "==" Vs equals()

Comparing Strings with "==" instead of equals() method is a common mistake made by new Java developers. What makes it interesting is that "String literal" when compared with "==" works perfectly fine, which makes Java developers think that it is the correct way of comparing Strings which in fact is not. Actually, "==" operator returns true if both the operands point to the same String object. When we create a String literal, a new String object is not created every time. First it is searched in the pool if that object is already present. If it is present, that object is returned. If it is not present, a new object is created and added to the pool. So, "==" operator works fine if both the operands are String literals. To avoid subtle bugs, we should always compare Strings using equals() method.

The following Java program makes it clear :