Sunday, 1 December 2013

Hyper threading and CPU core binding

When we touch the threshold to increase the performance through software, we go back to the hardware. With a fixed set of machines and cores, the only thing we can exploit is hyper-threading, and binding a specific process to a particular core and not allowing any other process to execute on that core. We often encounter this scenario in low latency trading systems domain where even micro seconds latency is not acceptable.

The video in the following link explains hyper threading in very simple terms.

Here is a nice article to get a sense of how to bind a process to a CPU core :


Saturday, 26 October 2013

Posix Threads

In this post I would demonstrate the usage of Posix threads through a very simple problem. I would also demonstrate how we can synchronize Posix threads using mutex. Though there are many synchronizing primitives available, I would stick to mutex for simplicity. Please note that the code presented here is only for demonstration purpose and is inefficient.

Before starting on the problem, let me give a brief introduction of Posix threads. Historically, hardware vendors developed their own proprietary implementations of threads. These implementations differed substantially from each other. So, it was very difficult to develop multi-threaded portable applications. So, a standard specification was required to standardize the thread interface and this interface is Posix thread.

Problem :

A string is to be processed by a given number of threads. The processing in this case would be printing of the string. Each thread should do specific amount of work before passing control to the next thread. The work in our case is the printing of characters. The control is to be passed to the next thread in a round robin manner till the string is processed completely.

Program should take 3 command line arguments.
<Program name> <Input String> <Amount of work> <Number of threads>
Here is the desired output for a sample run.
ankit@ankit:~/problems/p1$ ./driver ABCDEFGHILKLMN 2 3
Thread 1 : AB
Thread 2 : CD
Thread 3 : EF
Thread 1 : GH
Thread 2 : IL
Thread 3 : KL
Thread 1 : MN
ankit@ankit:~/problems/p1$

Note that each thread is getting control in a round robin manner. In the beginning, thread 1 printed "AB" since the amount of work specified is 2. Then thread 2 printed "CD", then thread 3 printed "EF". After this, control again came to thread 1.


This program should be complied using -lpthread option. eg: gcc driver.c -lpthread -o driver

This problem throws light on many basic aspects of Posix threads like thread creation, thread synchronization. Ideally the sleeping part in the program is not efficient. Instead we should wait the thread and signal when its turn come. But this is for demonstration purpose.

Sunday, 20 October 2013

Subset Problems

In this post, I would discuss some basic programming problems on sets.

Problem Statement:

Given a list of numbers, find all the possible subsets of the list of numbers. For example, if the set of numbers given is {1,2,3}, the output should be {},{1},{2},{3},{1,2},{1,3},{2,3},{1,2,3}.

Solution:

The first solution is recursive is pretty straight forward. Please follow the Java code given below.

The second solution is iterative and is more efficient than the first solution. This solution works based on the principle that for a set of n elements, there would 2^n subsets. The binary representation of an integer i in the range 0 to (2^n - 1) is used to find the corresponding ith subset. For example, if there are 3 elements in the set, we enumerate the binary representation of all the numbers from 0 to 7. Each binary representation represents a subset. The set bit at ith index represents that take ith element from the set. Here are those :
000 -> Empty set
001 -> Only take 1st element
010 -> Only take 2nd element
011 -> Take 1st and 2nd element
100 -> Only take 3rd element
101 -> Take 1st and 3rd element
110 -> Take 2nd and 3rd element
111 -> Take all elements



If we tweek the above problem statement a little. Here is the new problem statement:

Problem Statement:

Given a list of numbers from 0-9, find the greatest number that can be formed from these numbers such that the number is divisible by three

Solution:

We can use the recursive approach above. One more property is needed : If a number is divisible by 3, then the sum of its digits is also divisible by three. Since we have to find the greatest number, we can start with the greatest number in the recursive solution above. Also the list need to be reverse  sorted so that the smallest number is removed first. Here is the code segment :

Sunday, 6 October 2013

Finding Overlapping Rectangles in a given set of Axis aligned rectagles

Describe an algorithm that takes an unsorted array of axis-aligned rectangles and returns any pair of rectangles that overlaps, if there is such a pair. Axis-aligned means that all the rectangle sides are either parallel or perpendicular to the x- and y-axis. You can assume that each rectangle object has two variables in it: the x-y coordinates of the upper-left corner and the bottom-right corner.

Solution :

We can use Sweep Line algorithm for this. Create a sorted array of the x-coordinates of the left and the right edges of all the rectangles. The x-coordinates should be annotated with the rectangles they belong and whether they are the left or the right edge of the rectangle. Now, scan the sorted list from left to right, adding each rectangle to a balanced BST when you encounter the left edge of a rectangle and removing the rectangle from the balanced BST when you encounter the right edge of the rectangle. The solution requires the usage of one dimensional range search tree. A range search tree is a tree which is capable of performing range search queries efficiently. The time complexity of a range search query in a range search tree is O(R + logN), where N is the number of nodes in the tree and R is the number of elements in the result set.

First lets see how a search query works in a one dimensional range search tree:
1DRangeSearchTree(k1, k2, v):
    Input : Keys k1 and k2, and a node v of a binary search tree T
    Output : The elements whose keys are greater than or equal to k1 and less than or equal to k2.
    Algorithm :
        if key(v) >= k1 and key(v) <= k2
            ElementsInLeftSubTree = 1DRangeSearchTree(k1, k2, T.left(v))
            ElementsInRightSubTree = 1DRangeSearchTree(k1, k2, T.right(v))
            return ElementsInLeftSubTree U ElementsInRightSubTree  U {element(v)}
        else if key(v) < k1
            return 1DRangeSearchTree(k1, k2, T.right(v))
        else if key(v) > k2
            return 1DRangeSearchTree(k1, k2, T.left(v))

Now lets see the detailed algorithm of the original problem:
1) Sort all left and right rectange edges, according to their X value, into list L.
2) Create a new empty range search tree T, for Y ordering of rectangle tops/bottoms
3) Create a new empty result set RS of unique rectangle pairs
4) Traverse L in ascending order. For all V in L:
   If V.isRightEdge()
      T.remove(V.rectangle.top)
      T.remove(V.rectangle.bottom)
   else
       For all U in T.getRange(V.rectangle.top, V.rectangle.bottom)
         RS.add(<V.rectangle, U.rectangle>)
      T.add(V.rectangle.top)
      T.add(V.rectangle.bottom)
5) return RS

Time complexity is O(R + N log N) where R is the actual number of intersections.

Please note that this algorithm does not consider two rectangles as overlapping if one rectangle is completely within another rectangle. However this can be corrected with a minor tweek in the search algorithm for 1DRangeSearchTree.

Reference : http://stackoverflow.com/questions/3324880/axisaligned-rectangles-intersection

Wednesday, 2 October 2013

Java ConcurrentHashMap demystified

This post is in continuation of my previous post on Java HashMap. You may want to go through that post if you are not sure of Java HashMap internals.

As mentioned in the previous post, Java HashMap is not thread safe. If you want to use HashMap in a multi-threaded environment, you might look to HashTable or ConcurrentHashMap. HashTable has a bottleneck that it uses map wide lock which greatly slows down the application. ConcurrentHashMap uses several tricks to achieve a high level of concurrency and avoid locking, including using multiple write locks for hash buckets. It also exploits the uncertainties of the JMM to minimize the time that a lock is held or avoid locking at all.

Please note that the major barrier for HashTable is that it uses single map wide lock that is held for the entirety of insertion, retrieval, removal or traversal. ConcurrentHashMap uses a collection of 32 locks each of which guards a subset of hash buckets. This means that maximum of 32 threads can modify the map at one time. 32 is the theoretical concurrency limit and may not be achievable in practice. So, if 31 threads are writing to the map, it is very much possible that 32nd thread trying to modify the map will block.

Retrieval :

Unlike HashTable, ConcurrentHashMap may not acquire a lock while retrieving a value from the map. A very important point to note here is that synchronization offers much more than atomicity. It also guarantees ordering. Lets see how... Java Language Specification permits some memory operations not be made visible to other threads instantly because of the performance benefits of using per processor caches. Since ConcurrentHashMap may not acquire a lock during retrieval, it should be prepared to handle stale values. Instead of locking, the LinkedList used at any bucket location in ConcurrentHashMap is designed such that the implementation can detect when its view of the list has become inconsistent or stale. If it finds so, it acquires the appropriate lock on the bucket and searches again. This is one of the reasons why  ConcurrentHashMap performs badly in single threaded environment. This approach optimizes the lookup in the common case - where most retrievals are successful and retrievals outnumber insertions and removals.

Removal :

Removal operation on ConcurrentHashMap holds the bucket lock for the duration of its execution. To understand the removal, we need to know the data structure of an Entry object in ConcurrentHashMap. So, here it is :
Please note that all elements except the value are final, whereas value is volatile. I assume that the reader is aware of what a volatile variable means in Java. Since the next variable in Entry object is final, we can't add or remove an element to the LInkedList at a bucket location. So, how does removal work ??? When an entry object is removed from the LinkedList, its value field is assigned the value null. Since value field is volatile, it changed value is immediately visible to all threads. Once the value field is assigned the value null, the portion of the LinkedList from head to the removed element is cloned and joined to the remainder of the LinkedList following the removed element. So, if another thread is looking for the removed element in the stale chain, it would see null value and would retry with lock which would ensure that the chain is upto date. Eventually the discarded portion of the stale chain would be garbage collected.

Insertion :

A new Entry object is always added to the head of the LinkedList at a bucket. The same logic would apply as explained in Removal section. If another thread is trying to retrieve the inserted value in a stale chain, it would retry with synchronization since the inserted value won't be present in the stale chain.

Update :

Like removal operation, update operation on ConcurrentHashMap also holds the bucket lock for the duration of its execution.  This doesn't necessarily block the reader from executing since the retrieve operation doesn't always need a lock. Update operation just looks for the appropriate Entry object in the chain and updates its value field. Since the value field is volatile, its changed value is available to all the threads at once.

Sunday, 29 September 2013

Java Hash Maps demystified

Often new programmers use Hash Maps in Java without really knowing how they work. Knowing how Java Hash Maps work would help them to use them judiciously.

How an entry is added into a HashMap :

Lets begin with what happens when we add an entry to a hash map. For that, we use put() method of HashMap class. This method takes a key and a value. When we pass Key and Value object to put() method of Java HashMap, HashMap implementation calls hashCode method on Key object and applies returned hashCode into its hashing function. The hashing function returns a bucket location for this entry. I say entry because it comprises of both Key and Value. An important point to remember is that HashMap in Java stores both Key and Value object. Why it does so, would be clear shortly. Once it gets the bucket location, it adds the entry object to the bucket. A bucket location uses LinkedList to store various entry objects. When we add an entry object to a bucket, it is added at the head of the LinkedList to avoid tail traversing.

Collision resolution :

Java HashMap stores both Key and Value object to resolve collisions. If two Key objects have the same hashCode, they would have the same bucket location because given a hashCode, the hashing function would return the same bucket location every time. At a given bucket location Java HashMap uses LinkedList to store Entry objects (Entry object comprises of Key and Value objects). An important point to note is that if two objects have the same hashCode, they would go into the same bucket location. Since we use LinkedList to store values at a bucket location, having a long LinkedList can defeat the purpose of HashMap. So, we should judiciously implement hashCode() method of the Key object.

How a value is retrieved from a HashMap :

Now let's see how Java HashMap retrieves a Value object. The method provided for this purpose is get() and it takes a Key object as an argument. When we call get() method, the HashMap calculates hashCode of the Key object by calling hashCode method of the Key object. Once it gets the hashCode of the Key object, it calculates the bucket location of the Entry object by using hashing function. Since HashMap stores a LinkedList of Entries at a given bucket location, it uses key.equals() method to identify the correct node in the LinkedList. This explains the reason why we store both Key and Value object.

Rehashing :

What happens when the size of the HashMap exceeds a given threshold ? If it crosses a given threshold known as load-factor (which is 0.75), it resizes itself. Similar to other collection classes like ArrayList, Java HashMap resizes it self by creating a new bucket of size twice the previous size and then starts putting every old element into that new bucket array. This process is called rehashing since it applies hash function to find new bucket location. But wait !!! If you think of rehashing for a while, you would see big problem. There is a potential race condition. What if two threads find that HashMap needs resizing at the same time. In the process of resizing the HashMap, the element in bucket which is stored in LinkedList gets reversed in order because HashMap appends the new element at the head instead of the tail to avoid tail traversing. If race condition happens, you would end up in an infinite loop. That is the reason, Java HashMap is not thread safe. In such scenarios, ConcurrentHashMap should be used. There are more data structures and ways to obtain concurrency apart from ConcurrentHashMap.

Good Hash Map Keys :

Having good keys for a HashMap is very important. If the time taken to generate a hashCode is high, HashMap would perform very badly since every insertion calls hashCode method. Please note that retrieval does not compute hashCode since JVM calculates hashCode of each Key and provides it on demand. Also, if all the objects are mapped to the same bucket, it just as a LinkedList where retrieval would be very expensive. Now let's have a look at one more aspect of a good hash key. When we modify an object's state, JVM sets a flag which indicates that the object is modified and hashCode umust be recomputed. So, the next time when we call hashCode() method of that object,hashCode is recomputed. For this basic reasoning, it recommended to have immutable keys.

Very broadly, a good HashMap key should have the following traits :
1) hashCode() and equals() method of Key object should be fast.
2) hashCode should be well distributed to minimize hash collisions.
3) Key class should be immutable.

Saturday, 21 September 2013

Java String comparison using "==" Vs equals()

Comparing Strings with "==" instead of equals() method is a common mistake made by new Java developers. What makes it interesting is that "String literal" when compared with "==" works perfectly fine, which makes Java developers think that it is the correct way of comparing Strings which in fact is not. Actually, "==" operator returns true if both the operands point to the same String object. When we create a String literal, a new String object is not created every time. First it is searched in the pool if that object is already present. If it is present, that object is returned. If it is not present, a new object is created and added to the pool. So, "==" operator works fine if both the operands are String literals. To avoid subtle bugs, we should always compare Strings using equals() method.

The following Java program makes it clear :


Friday, 9 August 2013

Designing an efficient and scalable cache

Nearly all applications use some form of caching. Reusing the results of previous computation can decrease latency and increase throughput at the cost of some additional memory. Caching usually looks to be simpler than it is. Most of the naive cache implementations are designed taking single threaded application into consideration. But this design often turns into scalability bottleneck when application scales. Here I would design an efficient and scalable cache for a computational expensive function. I would be using Java for these examples. The codes are not complete and are only for clearing the concept.

Lets start with a naive cache implementation and try to improve it by finding various faults in it. Whenever we think of cache, HashMap comes to the mind. So, lets start with HashMap. Have a look at the following code snippet :

In the above code snippet, Computable declares an function with argument of type A and return type V. ExpensiveFunction implements this interface which takes a lot of time to compute. To implement naive cache, we would have a wrapper that records the results of previous computation and encapsulates the caching process. The technique is called Memoization. CacheImpl1 class is a first attempt at this which uses HashMap. The compute method in the wrapper first checks whether the result has already been computed. If it is already computed, it returns the result. If not, it computes the result and stores it in the cache for future use.

As we know, HashMap is not thread-safe, so to make the program thread-safe, the above method synchronizes the entire compute method. Though this approach ensures thread safety, but it has a performance bottleneck. Only one thread can execute compute at a time. Suppose one thread is busy executing compute which is an expensive function, no other thread would be able to execute compute even though the result has already been computed. If multiple threads are waiting for compute, it may actually take more time than without memoization. This is what we had not expected from caching.

Let's improve it a little bit by using ConcurrentHashMap which is thread safe. So, we need not synchronize the entire method. CacheImpl2 in the below snippet follows this approach, but it also has some bottlenecks. Two threads calling compute may end up computing the same value. Find out how ?? So, whats the benefit of caching. The mere advantage of caching is to prevent the same value from being calculated multiple times.

The problem with CacheImpl2 is that if one thread starts an expensive computation, the other thread is not aware that a computation is already in progress and may start the same computation. FutureTask class in Java can be handy here. Its worth going through this class first. This class provides methods to start and cancel a computation, query to see is the computation is complete and retrieve the results of a computation. FutureTask,get() is the answer to our requirement. This method returns the result if it is available, else it blocks till the result is computed and then returns it. CacheImpl3 is the below snippet uses FutureTask.
CacheImpl3 implementation above looks to be perfect, but it also has a flaw. There is still a case, wherein 2 threads can compute the same value. The culprit is the if block in the compute method which is non-atomic. Think !! With a little thought you should be able to think of that scenario. How to correct this flaw ??? putIfAbsent method of ConcurrentHashMap is the answer. Its clear from the function name, what it does. If you are still not sure, please go through the documentation. Let's look at some more corner cases. What is the computation is cancelled or it fails. Future attempts to compute this results will also fail. To counter this corner case, cache implementation should remove the Future from the cache in such cases. Here is the final implementation of cache :

Thursday, 8 August 2013

Implication of Open Interest in Futures Markets

What is Open Interest ?

The bookish definition says that Open Interest is the total number of outstanding contracts that are held by market participants at the end of the day. In other words, Open Interest is the total number of future contracts or option contracts that have not yet been exercised, expired or filled. Open Interest makes sense only for Futures market.

Open Interest measures the flow of money into the futures market and hence is used by traders to confirm trends and trend reversals in futures market. The open interest position that is reported each day represents the increase or decrease in the number of contracts for that day and hence is shown as a positive or negative number.

Open Interest Calculation :

A contract has both a buyer and a seller. The open interest for a day is the increase or decrease in the number of contracts for that day. For example, if 2 parties in a trade are initiating a new position i.e. 1 new buyer and 1 new seller are entering into the market, open interest would increase by one contract. On the other hand, if 2 parties in a trade are closing an existing position, open interest would decrease by one contract. There is one more possibility wherein one old trader passes its position to a new trader. In this case, open interest does not change.

Benefits of monitoring Open Interest :

In short, increasing open interest means that money is flowing into the market. Due to this, the present trend will continue. If the current trend is upward, it would remain upward. If it is downward, it would remain downward. Declining Open Interest means that current market trend is coming to an end. If the price jumps significantly but open interest remains leveled, it is an early warning to the end of bull market and vice-versa. Hence, an increase or decrease in prices while open interest remains same is an early indication of trend reversal.

Further reading:
Investopedia has more details on this.

Saturday, 27 July 2013

Interesting Programming Puzzle - Separate Engineers and Managers

There are n people of which more than half are engineers and the rest are managers. An engineer always tells the truth. A manager may  or may not speak the truth. to every ith guy, you can ask whether the jth guy is an engineer.
Find out a way to separate out engineers and managers in the most efficient way.

Solution :

Here's an n-1 query solution
1. Maintain three sets of people: UNSEEN, STACK, and DISCARD.
2. Initialize the process by picking one arbitrarily to be the STACK, everything else is UNSEEN.
3. Repeat the following step until UNSEEN is empty:
        a. Pick an UNSEEN element x, remove it from UNSEEN.
        b. Ask the top of the STACK y about x. If y says "manager" pop y off the stack and DISCARD both x and y. If it says "engineer" add x to the top of the STACK.

After all elements have been processed in this way (n-1 comparisons), the top of the stack must be an engineer. Once we get an engineer, we can separate out engineers and managers by asking the engineer since an engineer always speaks truth.

Why does this work? 

First observe that whenever we discard a pair, at least one of them is a manager(If you don't understand this, check by taking various cases). So among the rest of them (STACK and UNSEEN) a majority must still be engineers. So at the end,when UNSEEN is empty, there must be an engineer in the stack, therefore the top of the stack must be an engineer.

This can be improved to n-2 simply by stopping one earlier. When there's one UNSEEN left, if the stack is empty, that UNSEEN one is an engineer. Otherwise, the top of the stack must be an engineer. If is n is even and n>=4, as a first step we can just throw out one person, and apply this algorithm to the remaining n-1 obtaining n-3 comparisons. This gives the optimal algorithm.

Friday, 21 June 2013

Birthday Paradox

This is one of the very interesting problems in Probability theory. So here it goes. Birthday Paradox concerns  the probability, that in a set of n randomly chosen people, some pair of them will have the same birthday.

Consider the following example: Assuming for a moment that birthdays are evenly distributed throughout the year, if you're sitting in a room with forty people in it, what are the chances that two of those people have the same birthday? For simplicity's sake, we'll ignore leap years. A reasonable, intelligent person might point out that the odds don't reach 100% until there are 366 people in the room (the number of days in a year + 1)... and forty is about 11% of 366... so such a person might conclude that the odds of two people in forty sharing a birthday are about 11%. In reality, due to Math's convoluted reasoning, the odds are about 90%. This phenomenon is known as the Birthday Paradox.

It turns out that this phenomenon is very useful in cryptography and hashing algorithms. The birthday paradox is strange, counter-intuitive, and completely true. It’s only a “paradox” because our brains can’t handle the compounding power of exponents. We expect probabilities to be linear and only consider the scenarios we’re involved in (both faulty assumptions, by the way). So, without diverging away lets dive into the main part.

Lets work with some numbers. Suppose there are 23 people in a room. What are the chances that 2 people share a birthday ?
Lets list down all the possible ways, 23 people can have birthdays and count the ways where at least 2 people share a birthday. After this calculating this, the required probability is a cake walk. But since the number of ways are huge, this is hard to do. Lets flip the problem. Instead of finding all the ways where at 2 people share a birthday, lets find all the ways where no 2 people share a birthday. Then we can easily compute the probability that at least 2 people share a birthday by subtracting from 1. 

With 23 people, we have 253 combinations. 
(23 X 22) / 2 = 253
The chance of 2 people having different birthdays is :
1 - (1/365)  = (364 / 365) = 0.997260  --> This should be pretty easy to understand
Makes sense. 364 out of 365 days are Ok.

Now, all the 253 pairs should be different. We use exponents to find the probability. So, the probability that all the 253 pairs have different birthdays is 
(364 / 365) ^ 253 = 0.4995

Now the chances that out of 23 people in a room, no 2 people share a birthday is 1 - 0.4995 = 0.5005 = 50.05%

The generalized formula is :
P(n) = 1 - (364 / 365)^C(n,2), where n is the count of people in the room.

There is another way to look at this problem which may seem easier for some people:
Let's say you have a big wall calendar with all 365 days on it. You walk in and put a big X on your birthday. The next person who walks in has only a 364 possible open days available, so the probability of the two dates not colliding is 364/365. The next person has only 363 open days, so the probability of not colliding is 363/365. If you multiply the probabilities for all 23 people not colliding, then you get:
364/365 × 363/365 × … 365-23+1/365 = Chances of no collisions

Sunday, 14 April 2013

Catastrophic Backtracking

Lets start off with a small puzzle.

Here is the code :
What would be the output of the above code ? Here are the options :
a) 99
b) 100
c) Throw an exception
d) None of the above

As clear from the code, the regular expression will match "no" odd number strings of "a". Most the people would think that the answer is 99 since we have 99 even number strings of a's, but Oups !! Wait a minute !!

The correct answer is d option. Strange !! Isn't it ??
Well to add more to your confusion, the above program runs for more than 10^15 years before printing anything. How is this possible ?

This is due to "Catastrophic BackTracking" which is caused by our regex. Lets take a close look at it : (aa|aab?)+

Lets say we are matching a string that does not below to the pattern specified, example "aaaaa". Here is the DFA for this :


As seen from the DFA in the image above, the DFA engine traverses the "whole tree". The time complexity of the above code is exponential in the length of string : O(2^(n/2)) to be precise.


How can we fix it ?

The solution is simple. While nesting repetition operators, make sure that "there is only one way to match the same match". 
In our code, there are multiple ways to match a string. For example, both "aa" pattern is matched by both "aa" and "aab?". So, lets change our code to use a regular expression which adheres to the above principle :


The regex"(aab?)+" matches exactly the same strings as "(aa|aab?)+", but doesn't exhibit "catastrophic backtracking". The above code does produce an output and it prints 99.

Saturday, 6 April 2013

Java Garbage Collection

Garbage Collection is the process of looking at the heap memory, identifying which objects are in use and which are not, and deleting the unused objects. When we say that an object is in use, we mean that some part of our program still has a pointer to that object. An unused object on the other hand, is no longer referenced by any part of our program. Hence the memory used by the unused object can be reclaimed. This is what garbage collection does.

Java does not explicitly allocate and deallocate memory in the program code like in C or C++. Some developers set the object references to null or call System.gc() to free the memory explicitly. Explicitly nulling of object references is not a big deal  since modern Java compilers are smart enough to interpret when an object becomes out of scope in "almost all" the cases. But System.gc() would affect the system performance drastically and should not be called. In Java, the developer does not explicitly free the memory in the program code. The garbage collector takes care of it. The Java garbage collector is based on the following 2 hypotheses which are called the weak generation hypotheses:

  1. Most objects become unreachable soon.
  2. References from old objects to new objects only exist in small numbers.
In HotSpot JVM, the heap space is divided into 3 regions :
  1. Young Generation : Most of the newly created objects are located in this region. Since according to the above hypotheses most objects become unreachable soon, most objects die here. When we say objects die here, we mean to say that they are collected by minor GC. The objects which survive this generation are transferred to the old generation.
  2. Old Generation : The objects which survive the young generation are copied here. It is "generally" larger than the young generation. Since it is bigger in size, GC is less frequent than in the young generation. It should be noted that the size of the generations is configurable and is tuned based on the application requirement, but most of the times old generation is bigger than the young generation. When objects die in the old generation we say that a major GC has occurred.
  3. Permanent Generation : The permanent generation is for methods and classes. A GC can occur is this as well and it is counted as a major GC. This generation is populated by the JVM at runtime based on the classes in use by the application. In addition, Java SE library classes and methods may be stored here.
The following figure makes it clear.

Figure 1: GC Area & Data Flow.

Why Java uses different generations in the Heap ?

JVM used Mark and Sweep algorithm for garbage collection. The first step in this process is called marking. In this step, garbage collector identifies which objects are in use and which are not. This can be a time consuming process since we need to scan all the objects. The second step is the deletion step. In this step, unreferenced objects are removed, leaving behind referenced objects and pointers to free space. The following figures explain the 2 steps :
To improve the performance, in addition to removing the unreferenced objects, we compact the remaining referenced objects by moving them together. This makes new memory allocations easier and faster. Having to mark all objects in JVM is inefficient. As more and more objects are allocated, list of objects grows leading to longer garbage collection times. Instead, we can make use of the weak generation hypothesis as explained previously. That is why Java uses Generational Garbage Collection.

Composition of Young Generation :

The young generation is composed of 3 spaces : one Eden space and two Survivor spaces. The majority of the newly created objects are created in the Eden space. After one GC in the Eden space, surviving objects are moved to one of the survivor spaces. Once this survivor space is full, surviving objects are moved to the other survivor space. And surviving objects from this survivor space are moved to the old generation.

Some Garbage Collectors in JVM :

Some types of garbage collectors:
  1. Serial GC
  2. Parallel GC
  3. Concurrent Mark & Sweep (CMS) GC

Serial GC : 

This GC first marks the surviving objects in the old generation. It then sweeps the heap leaving only the surviving objects behind. It then moves the survived objects to one part. Serial GC causes stop-the-world i.e. the application becomes inactive during Serial GC.

Parallel GC : 

This GC also causes stop-the world. While Serial GC uses only one thread for GC, Parallel GC uses multiple threads for GC and is hence efficient than Serial GC. It makes sense to use this GC when we have multiple cores.

Concurrent Mark & Sweep GC : 

This GC uses only one thread. This GC performs the following steps :
  • Initial Mark
  • Concurrent Marking
  • Remark
  • Concurrent Sweeping
As clear from the below figure that Concurrent Mark Sweep GC stops the world only during the initial mark and remark phases. During the concurrent marking and sweeping phases, this GC runs along with application thread. We can parallelize this with concurrency in HotSpot JVM. This GC is used when the response time is very crucial.

Figure 5: Serial GC & CMS GC.


JVM options for different GC algorithms :

  • Serial generational collector (-XX:+UseSerialGC).
  • Parallel for young space, serial for old space generational collector (-XX:+UseParallelGC).
  • Parallel young and old space generational collector (-XX:+UseParallelOldGC).
  • Concurrent mark sweep with serial young space collector (-XX:+UseConcMarkSweepGC -XX:-UseParNewGC).
  • Concurrent mark sweep with parallel young space collector (-XX:+UseConcMarkSweepGC -XX:+UseParNewGC).
  • G1 garbage collector (-XX:+UseG1GC).

Saturday, 2 March 2013

Does Explicit nulling help garbage collection

Here I would be discussing whether explicit nulling of variables helps garbage collection in any significant way. For those who are not up to date with HotSpot GC, let me throw some light on that. The heap is divided into several spaces: Eden space, 2 survivor spaces and an old generation space. The Eden space and the 2 survivor spaces make up the young generation. When an object is created, it is created in the Eden space. Most objects die there (become out of scope) and are collected by the garbage collector. If an object  remains in the Eden space till the next minor GC runs, it is promoted to the 1st survivor space and then to the next survivor space. If an object becomes too old or the young generation becomes full, objects get promoted to the old generation space.

Now, lets get back on the topic: Does nulling variables explicitly help garbage collector ?
The answer is it depends. If the Eden space is full and an object is being created then if sufficient space could be created in Eden, then no more work is required. But if sufficient space can't be freed in Eden then existing objects in Eden space have to be promoted to the 1st survivor space before new objects can be created. There might be some objects in the Eden space that the application doesn't need now. But those objects have still not gone out of scope and hence can't be garbage collected. Explicitly nulling the variables holding the references to those objects would make a difference and those objects can now be garbage collected.

One thing that is to be noted here is that compiler can make object references null even if you don't make them null explicitly. Only in cases where the compiler can't prove that the object is no longer used, it doesn't make them null. Explicitly nulling can help GC in rare cases such as the one discussed above. All the following conditions should be true if you are explicitly nulling object references :
  • The variable is the only weak reference to the object.
  • The variable will stay in scope for an extended period of time.
  • The compiler is not able to prove that the object is no longer used but you are able to prove that the object is no longer needed.
In general, explicitly nulling is not required because the Java compiler is smart enough to do that. One very important use case of explicitly nulling is when you have a map having some entries that you no longer need. The objects to which the map entries point to are not eligible for garbage collection. Making them null explicitly does help. But something like the following won't help since the object would anyhow go out of scope and would be garbage collected in the next GC run. We should never do this :

Bit Manipulation : Find 2 numbers which are repeated odd number of times in an array in which all other numbers are repeated even number of times

Today I would be sharing a problem which can be very efficiently solved using bit manipulations.

Problem :
Given an array containing integer numbers containing numbers which are repeated even number of times except two numbers which are repeated odd number of times. The range of integer numbers is not given. Find the two numbers which are repeated even odd number of time.

Solution :
It can be solved in O(n) time and O(1) space. The basic concept here is x ^ x = 0. Here's the algorithm :

1) XOR all the elements in the array. This would nullify all the elements which are repeated even number of times. So, we would get the XOR of elements which are repeated odd number of times.

2) Now divide the set of numbers into 2 sets such that numbers which are repeated odd number of times go into different sets. To divide the set in this way is quite easy given that the set bits in the XOR of two numbers are the bits in which the two numbers differ. From step 1, we got the XOR of the numbers which are repeated odd number of times. Divide all the elements of the array into 2 sets depending on any of the set bits in the XOR output of step 1.

3) XOR all the elements within each set and we would get the elements which are repeated odd number of times since the elements which are repeated even number of times would have nullified  each other.

Here is the code snippet :
Now result1 and result2 would hold the numbers which are repeated odd number of times.

There is a trick to find the lowest bit position set. Suppose x is a number and we wish to find the lowest bit position which is set. We can do so by :
x |= x & (x-1);
The above code snippet can be made more efficient by employing the above rule.



Friday, 22 February 2013

David Shaw's Supercomputer - Anton

Things have been quite on the blog, but not because there has not been a lot to say. In fact, there has been so much happening that I have had not the idle cycles to write about them.

Recently got a chance to meet one of the D.E.Shaw Research scientists and got an insight into what they are doing. Prior to forming D.E. Shaw Research, David Shaw founded D.E. Shaw & Co., a hedge fund which is one of the most successful quantitative analysis shops. Since 2001, David Shaw has been doing full time research as part of D.E.Shaw Research. D.E. Shaw Research develops both algorithms and customized machine architectures for molecular dynamic simulations such as protein folding. The result is the Anton Supercomputer, a heavily customized machine with 512 specialized computing nodes specifically designed for particle interaction simulations.

Basically, each of the 512 nodes consists of a heavily pipelined special-purpose ASIC that is designed to compute the particle force interactions (using an algorithm called the NT method), along with a general-purpose processor that supports a limited instruction set. Communication is heavily optimized to reduce the amount of data exchanged between nodes. The processors are connected into a 3D toroidal hypercube and each processor "owns" a set of particles corresponding to a particular cube of space. One of these is to be hosted at the Pittsburgh Supercomputing Center.

The ability to perform long, accurate Molecular Dynamics (MD) simulations involving proteins and other biological macro-molecules could in principle provide answer to some of the most important outstanding questions in the field of biology, chemistry and medicine.

Sunday, 27 January 2013

Facebook Graph Search

As Facebook continues to roll out its new feature, more and more users are getting their hands on this new search tool. Facebook's stalking would now be limited only by user's creativity with search queries and how how they use this feature. Many people are scared by the thought that there stuff would go public. But it should be very clear that no private information is exposed by graph search. People are already sharing stuff on Facebook. Its just that it is now easily accessible. Lets say we want to search for the people who like cricket. One could have easily visited pages related to cricket and look at the people who liked those pages. With the introduction of graph search, we don't have to do this extra effort. We can just search. So, graph search only searches the information which is public. It does not breach the privacy of a person.I am trying to overstate that anyone who has a Facebook account and has not locked down his or her privacy settings is ignorant.  Here is what Facebook says :
"With Graph Search, you can look up anything shared with you on Facebook, and others can find stuff you've shared with them, including content set to Public. That means different people see different results."

Facebook's new Graph Search feature owes part of it thanks to Bing. Facebook and Microsoft have worked together for a number of years now with Microsoft investing $240 million in Facebook back in 2007. Now when users search for people, places and things using Facebook's search bar, the results would be shown alongside web results courtesy of Bing.

Of course, the same was the case when Google came and we had to learn to adjust to the fact that "internet never forgets" - the information that you posted online years ago can come back to haunt you. But Facebook has taken that to new level of intimacy because much of the stuff here seems so ephemeral : a like, a follow - things that you might not even remember doing.

Google is looking on in jealously, comparing the number of Google+ users to Facebook's gargantuan total and shedding a few tears. The reigning champ of search engines, Google did its best to socialize search with Google+. But Bing managed to make searching more social by tapping into Facebook. No matter how much Google denies, but seeing how many of your friends "Like" something on the web still carries much more weight than a "+1".


Official Facebook Graph Search Introduction page : http://newsroom.fb.com/News/562/Introducing-Graph-Search-Beta


Sunday, 6 January 2013

Allocation and De-allocation - Java Vs C/C++



Java allocation are cheaper than C/C++, an assertion that many developers have difficulty even imagining. The 1.0 and 1.1 JDKs used mark sweep garbage collector, which did compaction on only some collections. The result of this was that the heap might be fragmented after the garbage collection. This was comparable to C/C++. In C/C++, the allocator uses heuristics such as first-first or best-fit to manage the free heap space.

In Hot Spot JVM, generational collector was introduced. Now the entire heap space was divided into generations. In young generation, garbage collection was frequent than the older generations. Here the young generation uses copying collector due to which the free space in the heap is always contiguous so that allocation of a new object from the heap can be done through a simple pointer addition. This makes object allocation in Java significantly cheaper than in C. In C, the memory allocator will use first-fit or best-fit. First-fit heuristic will walk through the free space to find the first free memory space that can meet the requirement. On the other hand, best-fit will walk through 
the heap to find the best available free memory chunk that meets the requirement. Whereas in Hot Spot JVM, the allocator does not have to walk through the heap to allocate a chunk.

Lets talk about deallocation now. Copying collectors do not visit dead objects. So in applications with a large number of temporary objects, garbage collection is quite cheap in case of copying collector. It just copies the live objects to a survivor space and reclaims the entire heap in 1 full swoop. So, both allocation and deallocation costs per object went down in JDK 1.2. Sun approximates allocation costs at approximately 10 machine instructions.

JIT compiler performs additional optimizations that can significantly reduce the cost of allocation. It can allocate the object on the stack instead of on the heap. This is a very very powerful enhancement which uses the power of C. In Java, we can't create objects on the stack. But using stack saves a lot of time of allocation and deallocation. So, all the objects which have very small life time can be pushed created on the stack instead of in the heap. JIT compiler does it automatically. The allocation costs in Java are still going to go down with more optimization at the compiler level. So, there is no reason to compromise the maintainability for the sake of avoiding few extra allocations.

One question that rings is when a lot of threads are calling the allocator for some chunk of memory in the heap, isn't it a scalability issue ? And the answer is no. This is because JVM's use thread locals heaps. So, each thread is given a thread local heap which is of the order of 1K. So, when a thread asks for some chunk of memory, JVM tries to give it from the thread local heap. If thread local heap can't satisfy this demand, only then it goes to the global allocator. This case is very rare in most of the applications. So in most of the cases, we don't lock the global heap. Each thread uses its thread local heap. Hence, allocation is no longer a scalability issue.