Friday 1 June 2012

GC overhead limit exceeded error




Recently I was struggling with a very unusual error in one of the batches in production environment. The error read like:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

Here is the snapshot of the error when I ran it through command prompt:

Explored on this error and in this post I would like to highlight my findings:

This message means that for some reason the garbage collector is taking an excessive amount of time (by default 98% of all CPU time of the process) and recovers very little memory in each run (by default 2% of the heap). This effectively means that your program stops doing any progress and is busy running only the garbage collection at all time. To prevent your application from soaking up CPU time without getting anything done, the JVM throws this Error so that you have a chance of diagnosing the problem.
The rare cases where I've seen this happen is where some code was creating tons of temporary objects and tons of weakly-referenced objects in an already very memory-constrained environment. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. You can turn this off with the command line option -XX:-UseGCOverheadLimit
In my case the data was huge. We had deployed a batch in production which was implemented using stateful framework. But it had to be changed to the stateless code which took around a month. During that time that batch didn’t process any records. So the records had piled up which needs to be processed. When we deployed the stateless code, it gave this error. When I was running this batch process, I had allocated maximum of 1 GB memory to it. Then I removed this limit. Even then it gave this error. So, I had to turn off this feature to get rid of it by the command line option I had mentioned above.

Now the obvious question that comes to the mind is that what happens to the Java process in case of OutOfMemoryError.
And OutOfMemoryError is handled like any other exception:
·         If it is caught, then nothing more happens.
·         If it is not caught, then either the threads or the threads groups uncaught exception handler handles it. This pretty much always leads to the thread being stopped.
However there are two factors that are not really there in other exceptions:
·         OutOfMemoryError is an Error and not an Exception. This means that it's very unlikely to be caught anywhere: You should not try to catch an Error generally (with very few exceptions) and it's not usually done, so the chances of it being handled are rather low.
·         When an OutOfMemoryError happens and no object become eligible for GC because of that, then you'll still have little memory left and chances are that you'll run into the exact same problem again later on.
And if the thread this happens to is the only non-daemon thread (often, but not necessarily, that's the main thread, that executes the main method), then that thread getting killed results in the whole JVM shutting down (which is often perceived as "a crash").
So it will probably kill the thread, and if the memory-issue is not solved, then this can happen to more and more threads.

OutOfMemoryError should be considered unrecoverable and the behavior of the JVM after such an error has been raised is undefined, so there is no point in expending effort to handle it. Any operations done after this exception is thrown by the JVM will have undefined behavior. They may execute, but more likely they will just cause another error to be thrown.