Saturday, 2 March 2013

Bit Manipulation : Find 2 numbers which are repeated odd number of times in an array in which all other numbers are repeated even number of times

Today I would be sharing a problem which can be very efficiently solved using bit manipulations.

Problem :
Given an array containing integer numbers containing numbers which are repeated even number of times except two numbers which are repeated odd number of times. The range of integer numbers is not given. Find the two numbers which are repeated even odd number of time.

Solution :
It can be solved in O(n) time and O(1) space. The basic concept here is x ^ x = 0. Here's the algorithm :

1) XOR all the elements in the array. This would nullify all the elements which are repeated even number of times. So, we would get the XOR of elements which are repeated odd number of times.

2) Now divide the set of numbers into 2 sets such that numbers which are repeated odd number of times go into different sets. To divide the set in this way is quite easy given that the set bits in the XOR of two numbers are the bits in which the two numbers differ. From step 1, we got the XOR of the numbers which are repeated odd number of times. Divide all the elements of the array into 2 sets depending on any of the set bits in the XOR output of step 1.

3) XOR all the elements within each set and we would get the elements which are repeated odd number of times since the elements which are repeated even number of times would have nullified  each other.

Here is the code snippet :
Now result1 and result2 would hold the numbers which are repeated odd number of times.

There is a trick to find the lowest bit position set. Suppose x is a number and we wish to find the lowest bit position which is set. We can do so by :
x |= x & (x-1);
The above code snippet can be made more efficient by employing the above rule.



Friday, 22 February 2013

David Shaw's Supercomputer - Anton

Things have been quite on the blog, but not because there has not been a lot to say. In fact, there has been so much happening that I have had not the idle cycles to write about them.

Recently got a chance to meet one of the D.E.Shaw Research scientists and got an insight into what they are doing. Prior to forming D.E. Shaw Research, David Shaw founded D.E. Shaw & Co., a hedge fund which is one of the most successful quantitative analysis shops. Since 2001, David Shaw has been doing full time research as part of D.E.Shaw Research. D.E. Shaw Research develops both algorithms and customized machine architectures for molecular dynamic simulations such as protein folding. The result is the Anton Supercomputer, a heavily customized machine with 512 specialized computing nodes specifically designed for particle interaction simulations.

Basically, each of the 512 nodes consists of a heavily pipelined special-purpose ASIC that is designed to compute the particle force interactions (using an algorithm called the NT method), along with a general-purpose processor that supports a limited instruction set. Communication is heavily optimized to reduce the amount of data exchanged between nodes. The processors are connected into a 3D toroidal hypercube and each processor "owns" a set of particles corresponding to a particular cube of space. One of these is to be hosted at the Pittsburgh Supercomputing Center.

The ability to perform long, accurate Molecular Dynamics (MD) simulations involving proteins and other biological macro-molecules could in principle provide answer to some of the most important outstanding questions in the field of biology, chemistry and medicine.

Sunday, 27 January 2013

Facebook Graph Search

As Facebook continues to roll out its new feature, more and more users are getting their hands on this new search tool. Facebook's stalking would now be limited only by user's creativity with search queries and how how they use this feature. Many people are scared by the thought that there stuff would go public. But it should be very clear that no private information is exposed by graph search. People are already sharing stuff on Facebook. Its just that it is now easily accessible. Lets say we want to search for the people who like cricket. One could have easily visited pages related to cricket and look at the people who liked those pages. With the introduction of graph search, we don't have to do this extra effort. We can just search. So, graph search only searches the information which is public. It does not breach the privacy of a person.I am trying to overstate that anyone who has a Facebook account and has not locked down his or her privacy settings is ignorant.  Here is what Facebook says :
"With Graph Search, you can look up anything shared with you on Facebook, and others can find stuff you've shared with them, including content set to Public. That means different people see different results."

Facebook's new Graph Search feature owes part of it thanks to Bing. Facebook and Microsoft have worked together for a number of years now with Microsoft investing $240 million in Facebook back in 2007. Now when users search for people, places and things using Facebook's search bar, the results would be shown alongside web results courtesy of Bing.

Of course, the same was the case when Google came and we had to learn to adjust to the fact that "internet never forgets" - the information that you posted online years ago can come back to haunt you. But Facebook has taken that to new level of intimacy because much of the stuff here seems so ephemeral : a like, a follow - things that you might not even remember doing.

Google is looking on in jealously, comparing the number of Google+ users to Facebook's gargantuan total and shedding a few tears. The reigning champ of search engines, Google did its best to socialize search with Google+. But Bing managed to make searching more social by tapping into Facebook. No matter how much Google denies, but seeing how many of your friends "Like" something on the web still carries much more weight than a "+1".


Official Facebook Graph Search Introduction page : http://newsroom.fb.com/News/562/Introducing-Graph-Search-Beta


Sunday, 6 January 2013

Allocation and De-allocation - Java Vs C/C++



Java allocation are cheaper than C/C++, an assertion that many developers have difficulty even imagining. The 1.0 and 1.1 JDKs used mark sweep garbage collector, which did compaction on only some collections. The result of this was that the heap might be fragmented after the garbage collection. This was comparable to C/C++. In C/C++, the allocator uses heuristics such as first-first or best-fit to manage the free heap space.

In Hot Spot JVM, generational collector was introduced. Now the entire heap space was divided into generations. In young generation, garbage collection was frequent than the older generations. Here the young generation uses copying collector due to which the free space in the heap is always contiguous so that allocation of a new object from the heap can be done through a simple pointer addition. This makes object allocation in Java significantly cheaper than in C. In C, the memory allocator will use first-fit or best-fit. First-fit heuristic will walk through the free space to find the first free memory space that can meet the requirement. On the other hand, best-fit will walk through 
the heap to find the best available free memory chunk that meets the requirement. Whereas in Hot Spot JVM, the allocator does not have to walk through the heap to allocate a chunk.

Lets talk about deallocation now. Copying collectors do not visit dead objects. So in applications with a large number of temporary objects, garbage collection is quite cheap in case of copying collector. It just copies the live objects to a survivor space and reclaims the entire heap in 1 full swoop. So, both allocation and deallocation costs per object went down in JDK 1.2. Sun approximates allocation costs at approximately 10 machine instructions.

JIT compiler performs additional optimizations that can significantly reduce the cost of allocation. It can allocate the object on the stack instead of on the heap. This is a very very powerful enhancement which uses the power of C. In Java, we can't create objects on the stack. But using stack saves a lot of time of allocation and deallocation. So, all the objects which have very small life time can be pushed created on the stack instead of in the heap. JIT compiler does it automatically. The allocation costs in Java are still going to go down with more optimization at the compiler level. So, there is no reason to compromise the maintainability for the sake of avoiding few extra allocations.

One question that rings is when a lot of threads are calling the allocator for some chunk of memory in the heap, isn't it a scalability issue ? And the answer is no. This is because JVM's use thread locals heaps. So, each thread is given a thread local heap which is of the order of 1K. So, when a thread asks for some chunk of memory, JVM tries to give it from the thread local heap. If thread local heap can't satisfy this demand, only then it goes to the global allocator. This case is very rare in most of the applications. So in most of the cases, we don't lock the global heap. Each thread uses its thread local heap. Hence, allocation is no longer a scalability issue.

Sunday, 23 December 2012

Magic of Floating Point Representation in the Digital World

Many great engineering and scientific advances of recent decades would not have been possible without the floating-point capabilities of digital computers. Still, some results of floating-point calculations look pretty strange, even to people with years of mathematical experience. I will attempt to explain the causes of some of these strange results and give some suggestions where appropriate.
Lets take a look at the following example :



groovy> System.out.println(100.87d * 0.01d)
1.0087000000000002    --> Oops ! What is happening here ?
groovy>

We all know 100.87 multiplied by 0.01 is 1.0087. How is this junk introduced. Lets look at the following statements to add more confusion:

groovy> System.out.println(100.87f * 0.01f)
1.0087000049196178
groovy> 
groovy> System.out.println(100.87 * 0.01)
1.0087
groovy>

To answers these questions lets go to the basics.
How are floating point numbers represented in the modern computer systems and how is arithmetic done. As we all know, floating point numbers are represented in binary format which follows IEEE standard for floating point arithmetic. 
The format for single precision numbers uses 32 bits divided in the following way,
     seeeeeeeefffffffffffffffffffffff
     
     s = sign bit, 1 bit
     e = exponent, 8 bits  (E_min=-126, E_max=127, bias=127)
     f = fraction, 23 bits
The format for double precision numbers uses 64 bits divided in the following way,
     seeeeeeeeeeeffffffffffffffffffffffffffffffffffffffffffffffffffff
     
     s = sign bit, 1 bit
     e = exponent, 11 bits  (E_min=-1022, E_max=1023, bias=1023)
     f = fraction, 52 bits

The above problem is because some numbers can't be represented exactly in this format.

(100.87)10 = (1100100.11011110101110000101000111101011100001010001111010111000010100.........)2

When we say 100.87d, it means 100.87 is stored in a double variable which has 64 bit precision and when we say 100.87f, it means 100.87 is stored in a float variable, which has 32 bit precision. So even though 100.87 is a real number, it can't be exactly represented in binary format. This poses a big challenge to programmers in financial domain where accuracy is very important. So if you normalizing a price of 100.87 with a scale of 0.01, you would introduce noise as seen above. Mostly what programmers do is rounding to some significant digits based on the requirement. 


A key feature of the IEEE standard is that it requires correctly rounded arithmetic operations. Very often, the result of an arithmetic operation on two floating point numbers is not a floating point number. This is most obviously the case for multiplication and division; for example, 1 and 10 are both floating point numbers but 1/10 is not, regardless of where the single or double format is in use. It is also true of addition and subtraction.

Let x and y be floating point numbers, let +,−,* ,/ denote the four standard arithmetic operations, and let (+),(-), (*), (/) denote the corresponding operations as they are actually implemented on the computer. Thus, x + y may not be a floating point number, but x (+) y is the floating point number which is the computed approximation of x + y. When the result of a floating point operation is not a floating point number, the IEEE standard requires
that the computed result is the rounded value of the exact result. It is worth stating this requirement carefully. The rule is as follows: if x and y are floating point numbers, then
x (+) y = round(x + y);
x (-) y = round(x − y);
x (*) y = round(x * y);
and
x (/) y = round(x / y)
where round is the operation of rounding to the nearest floating point number in the single or double format, whichever is in use.

Now lets explain the behaviors seen in my cases presented at the beginning.
groovy> System.out.println(100.87d * 0.01d)
1.0087000000000002
groovy> System.out.println(100.87f * 0.01f)
1.0087000049196178
groovy> 

100.87 is not exactly representable in binary, same is the case with 0.01. We represent an approximation of these numbers in binary. The rule is if a number is not exactly representable, then it must be approximated by one of the nearest representable values. So, when we are multiplying 100.87 with 0.01 in the above case, we actually are multiplying some approximation of these numbers and that's why we the junk. This is expected. But what about the below case. What is happening here ?
groovy> System.out.println(100.87 * 0.01)
1.0087
groovy>

In the above case we are not storing the intermediate values in any variable and hence no rounding is done. They are stored in registers and hence we are getting better precision. To prove my point analyze the following cases:
groovy> System.out.println((float)(100.87f * 0.01f))
1.0087
groovy> System.out.println(100.87f * 0.01f)
1.0087000049196178
groovy>

In the 1st case you are putting the value in a float variable and then printing. In the 2nd case, you are directly printing the value in the registers.

Saturday, 8 December 2012

HipHop for PHP

HipHop for PHP is a source code transformer for PHP script code. Hiphop for PHP is a set of PHP execution engines. HipHop started as project at Facebook Inc. and was later made open source. Till date, facebook has achieved 6X reduction in CPU utilization for its site using HipHop as compared to Apache.

One of the design of HipHop was to write complex logic with PHP. Since PHP is an interpreted language, it is bound to be slow. Companies like Facebook which have large PHP codebases would have to write their complex functionality with extensions in C or C++. This would lead to lesser amount of resources which could work on the complete codebase. By continuing to use PHP with HipHop Facebook is able to maintain a high number of engineers who can work on the whole codebase.

HipHop has evolved over the years. Initially it started as 'HPHPc' which translates the PHP code to C++ code and  then passes it to gcc which compiles it into one monolithic binary representing the site's entire code tree. This gave significant performance gains, but it was horrible for development. The most important reason to use PHP is to avoid having to recompile for every small change, since PHP is a interpreted language.

'HPHPi' came hereafter as an interactive version of HPHP. It had the feature to automatically recompile on changing the behavior, but it led to different behavior in development and production machine which is risky.

HipHop Virtual Machine (HHVM)was created with an intent to replace both HPHPc and HPHPi. HHVM does not call gcc unlike its predecessors. HHVM transforms PHP code to byte code just as regular PHP does. It differs from the regular PHP due to its optimizer and JIT(Just in Time Compilation).

The following article gives a complete picture of HipHop VM :
https://www.facebook.com/note.php?note_id=10150415177928920

Sunday, 2 December 2012

Algorithmic Trading - Part 1

Algorithmic Trading is the use of electronic platform for placing trading orders with an algorithm deciding the timing, price and size of orders. High Frequency Trading (HFT) is a special class of algorithmic trading in which trading systems make trading decisions based on the information they receive electronically before human traders are capable of processing the information they observe. 

Who is suited for Algorithmic Trading ?

The most important reason for going for algorithmic trading is to control the market impact while placing big orders and thus seek favorable prices. Anyone who places big orders needs to worry about the market impact. As a thumb rule, if the size of order exceeds the sum of best 5 levels you need to worry about the market impact.  Big institutions like pension fund managers, investment banks, hedge funds, etc. are the ones who place Algo orders.

 Lets see how algorithmic trading minimizes market impact. All the orders placed can be classified either as Market Orders or Limit Orders. Markets orders want the exchange to execute the order in the best possible price. So, if you want to buy, the exchange will find the seller who is willing to sell at the lowest possible price and execute the order. Limit Ordrs on th other hand specify the limit on the negative side. These limit orders are not executed right away and they stay in the book for some time. These live but non executed orders form what is called the order book. Lets say I place a Market Order to buy 1 million Yahoo shares. The exchange will try to find the best seller which is selling at the least price and then the next best and so on, till the I get 1 million shares. This means that the execution price is going to be poor. On the other hand if I place a limit order. Now, the exchange will execute whatever it can but it will stop as soon as there are no more sell side orders with favorable price. Once this is done, your order becomes part of order book.  Once everyone, including other algorithms, sees such a massive order sitting on the buy side, price is bound to rise sharply.

However, algorithmic trading has generally been a sell-side product. This means that institutional brokers are the ones who actually have the systems for placing and managing algorithmic orders. Institutional traders place Algo orders with brokers who submit the orders either programatically from automated systems or manually through user interfaces. Institutional brokers and investors have invested a lot of time and money on Algo trading systems over the years.

Lets look at one of the simplest algorithmic strategies :

TWAP (Time Weighted Average Price) trading strategy :

TWAP breaks a big orders into smaller chucks of orders which are executed after a fixed interval. But a simple algorithm or even a smart trader can figure out this pattern by looking at the order book. So, nearly all production TWAP algorithms involve some type of randomization. This randomization can be done in terms of the sizes of orders or in terms of the intervals between the orders or both. 

The basic rule is to start with an initial order size and keep increasing or decreasing the size of order. The first question that comes to mind is when to increase and when to decrease the size of order. The answer depends on whether the market is in mean reversion mode or is trending towards one direction. If you a buyer and the price continues to drop, then it makes sense to wait for some time as the price is going to be more lucrative. On the other hand, if the market is in mean reversion mode, the trend is not going to last for ever and so we need to increase the size whenever price becomes favorable.

The next question is how to predict the market. Well, truly speaking, nobody can predict the market. We just try to be correct more often than we are wrong. This is where high frequency trading comes into picture which helps us to predict the market by analyzing the market trends quickly.