Saturday 15 March 2014

Java 8 : Difference between List.sort and Collections.sort

One of the most anticipated programming language updates in many years is here. Java 8 would be officially released on March 18, 2014. It offers a lot of new features which are worth exploring. One of them that we would be discussing here is List.sort. The addition of List.sort(Comparator) in Java 8 is fantastic. List.sort unsurprisingly allows you to sort a list, but the neat thing about List.sort is that for most of the common list implementations (such as ArrayList), it would sort much more quickly than Collections.sort. To understand this, first we need to know how Collections.sort works internally.

Collections.sort is a stable sort which means that equal elements won't be reordered after sorting. So, quicksort and heapsort are out of question since they are not stable. The most efficient stable sorting algorithm is merge sort. So, Java internally uses a modified version of merge sort. The exact algorithm was derived from Tim Sort which was developed by Tim Peter for Python. Tim sort is a combination of insertion sort and merge sort. If the number of elements is less than a certain value, insertion sort is used. This is because in case of smaller lists, the overhead of making recursive calls in case of merge sort is greater than the performance benefit obtained by using merge sort. Hence, it makes sense to use insertion sort for smaller lists.

Collections.sort implementation first dumps the specified list into a primitive array. Then it sorts this primitive array and puts back the sorted array elements back into the list in sorted order. Now the question is why do we need to copy the list into a new primitive array ? This is because Collections can be backed up by a primitive array (as in ArrayList) or can be backed up by a linked list (as in LinkedList). Copying into a new primitive array is done to avoid n^2 * log(n) performance that would result if we try to sort linked list in place.

But if we see carefully, this copying into a new primitive array is not required in case of ArrayList and some other list implementations since these list implementations are already backed up by a primitive array. This is exactly what List.sort takes advantage of. The default implementation of List.sort still does what Collections.sort does, but concrete implementing classes are free to optimize. For example, ArrayList.sort invokes Arrays.sort on the ArrayList's internal array. Hence the overhead of allocating and deallocating a new primitive array is gone. Also overhead of copying back the sorted elements from the array is gone. This is a big performance gain. Needless to say, we are saving space as well since we are not allocating space for a new primitive array.

Performance isn't the only potential gain from these new methods. Sorting a Collections.synchronizedList is an atomic operation using List.sort. You can iterate over all the elements of a List as an atomic operation using List.forEach. This was not possible in Java 7.

No comments:

Post a Comment