Thats right, the dreaded,
java.lang.OutOfMemoryError: unable to create new native thread. I fixed one of these the other day. And I will present to you how I did it. The cause for such errors are usually Thread Leaks. A Thread Leak is where some ass hat is creating threads which never die. Another reason for such errors is the limits an Operating System can place on the number of threads per process. If hit this limit, you would ge the same error.
A Thread Leak is when your application is creating threads that are not dying afterwards. This is very important concept. So take make sure you get this into your thick skull (as my English teacher used to say). Once you have one of these fuckers, its important to find out where its coming from. Later on I will give some tips you can do now to help with identifying the source of such leaks.
A source of such such leaks can be the JDK Executors. If an executor is created and once it has run atleast 1 Runnable, the thread will be kept alive. The javadoc for a fixed-number-of-threads executor states:
The threads in the pool will exist until it (the Executor) is explicitly shutdown.
A very stupid thing to do would be to create many executors, and assume that they will be cleaned up. Here is one source for a thread leak. That’ll be 5 dollaz.
Just get this this into your head: Do not go crazy with creating threads. And know when some action will cause threads to be created.
Limits, motherfucker! Limits!! Your machine has limits. You can’t just keep creating threads willy nilly. There are limits to the number of threads a process can create on linux. Speak to your trusty sysadmins about these limits. Use the Google and you will come across good resources like this.
Speak with your sysadmins to up the limits and make sure you have a standard setup that you apply to machines before you put them in production.
Now that you have had this exception, its not just a case of increasing limits and restarting your process. As you most probably have a thread-leak, if you just increase the limit and restart your process, you will end up in the same situation as before. Therefore, you need to fix it!
Tips to help with the bug-hunting
Here are some tips you should consider in your application:
In Java you can give threads names. This is great for finding such errors. Make sure that all of your threads have names that you gave them. And using this you can find out where the threads are created. Another tip is to always the save a thread-dump of your application a regular intervals. You will see why later on.
A great way to introduce a thread-leak is to create a ThreadPool and never clean it up. Or to create a ThreadPool per resource (like Socket, DB Connection). As something like a connection is long-living, creating a ThreadPool per resource is a bad idea. Imagine 1 Queued-ThreadPool per monkey. And you have 5000 monkeys online. You will have 5000 extra threads. THINK before creating the ThreadPool:
- how often will stuff need to be run through it
- does it make sense to create 1 per Whatever or will it be ok to have a static ThreadPool
Also make sure that your thread pools also have meaningful names for the threads.
Taking a Thread-Dump can cause your threads to pause for a short amount of time. So don’t do this too often. Once a day or so should be enough. Save these somewhere so that you can always compare the threads on one day with another. If on a particular day you have a leak, and provided you have well named threads, you should be able to identify which threads are ones causing the problem and find them in your codebase.
Finding the root-cause of such an OutOfMemory
If you end up having such an error, the things to do is:
- Create a stack dump (save to a file)
- Create a Heap Dump
Once you have those two start with the Stack Dump. Use your grep skillz to get the thread names. Try to find the ones that are created most often. Can you find the code that is creating them? If yes, then fix that shit, fool!
This is where it helps to have previous stack dumps. Because you can again use your grep & diff skillz to get the threads between two days and compare which ones are the ones causing problems. Once again, all of this is useless if you don’t name your threads. So ALAWAYS name your threads. And also make sure threads created by ThreadPools are also named. Usually using these methods you can find where the thread-leak is.
If the above does not help, start analyzing the heap dump. Do an OQL query looking for threads and their names. Find out which threads are there too often. Then using incoming references find out which objects hold onto them. Follow these references till you see one of your classes. Using this way you will eventually find the class that is creating those threads.
How you fix it is an exercise left to the reader.