Friday 9 August 2013

Designing an efficient and scalable cache

Nearly all applications use some form of caching. Reusing the results of previous computation can decrease latency and increase throughput at the cost of some additional memory. Caching usually looks to be simpler than it is. Most of the naive cache implementations are designed taking single threaded application into consideration. But this design often turns into scalability bottleneck when application scales. Here I would design an efficient and scalable cache for a computational expensive function. I would be using Java for these examples. The codes are not complete and are only for clearing the concept.

Lets start with a naive cache implementation and try to improve it by finding various faults in it. Whenever we think of cache, HashMap comes to the mind. So, lets start with HashMap. Have a look at the following code snippet :

In the above code snippet, Computable declares an function with argument of type A and return type V. ExpensiveFunction implements this interface which takes a lot of time to compute. To implement naive cache, we would have a wrapper that records the results of previous computation and encapsulates the caching process. The technique is called Memoization. CacheImpl1 class is a first attempt at this which uses HashMap. The compute method in the wrapper first checks whether the result has already been computed. If it is already computed, it returns the result. If not, it computes the result and stores it in the cache for future use.

As we know, HashMap is not thread-safe, so to make the program thread-safe, the above method synchronizes the entire compute method. Though this approach ensures thread safety, but it has a performance bottleneck. Only one thread can execute compute at a time. Suppose one thread is busy executing compute which is an expensive function, no other thread would be able to execute compute even though the result has already been computed. If multiple threads are waiting for compute, it may actually take more time than without memoization. This is what we had not expected from caching.

Let's improve it a little bit by using ConcurrentHashMap which is thread safe. So, we need not synchronize the entire method. CacheImpl2 in the below snippet follows this approach, but it also has some bottlenecks. Two threads calling compute may end up computing the same value. Find out how ?? So, whats the benefit of caching. The mere advantage of caching is to prevent the same value from being calculated multiple times.

The problem with CacheImpl2 is that if one thread starts an expensive computation, the other thread is not aware that a computation is already in progress and may start the same computation. FutureTask class in Java can be handy here. Its worth going through this class first. This class provides methods to start and cancel a computation, query to see is the computation is complete and retrieve the results of a computation. FutureTask,get() is the answer to our requirement. This method returns the result if it is available, else it blocks till the result is computed and then returns it. CacheImpl3 is the below snippet uses FutureTask.
CacheImpl3 implementation above looks to be perfect, but it also has a flaw. There is still a case, wherein 2 threads can compute the same value. The culprit is the if block in the compute method which is non-atomic. Think !! With a little thought you should be able to think of that scenario. How to correct this flaw ??? putIfAbsent method of ConcurrentHashMap is the answer. Its clear from the function name, what it does. If you are still not sure, please go through the documentation. Let's look at some more corner cases. What is the computation is cancelled or it fails. Future attempts to compute this results will also fail. To counter this corner case, cache implementation should remove the Future from the cache in such cases. Here is the final implementation of cache :

Thursday 8 August 2013

Implication of Open Interest in Futures Markets

What is Open Interest ?

The bookish definition says that Open Interest is the total number of outstanding contracts that are held by market participants at the end of the day. In other words, Open Interest is the total number of future contracts or option contracts that have not yet been exercised, expired or filled. Open Interest makes sense only for Futures market.

Open Interest measures the flow of money into the futures market and hence is used by traders to confirm trends and trend reversals in futures market. The open interest position that is reported each day represents the increase or decrease in the number of contracts for that day and hence is shown as a positive or negative number.

Open Interest Calculation :

A contract has both a buyer and a seller. The open interest for a day is the increase or decrease in the number of contracts for that day. For example, if 2 parties in a trade are initiating a new position i.e. 1 new buyer and 1 new seller are entering into the market, open interest would increase by one contract. On the other hand, if 2 parties in a trade are closing an existing position, open interest would decrease by one contract. There is one more possibility wherein one old trader passes its position to a new trader. In this case, open interest does not change.

Benefits of monitoring Open Interest :

In short, increasing open interest means that money is flowing into the market. Due to this, the present trend will continue. If the current trend is upward, it would remain upward. If it is downward, it would remain downward. Declining Open Interest means that current market trend is coming to an end. If the price jumps significantly but open interest remains leveled, it is an early warning to the end of bull market and vice-versa. Hence, an increase or decrease in prices while open interest remains same is an early indication of trend reversal.

Further reading:
Investopedia has more details on this.