In any non-trivial application of somewhat significant size you will need Executors to background processing or asynchronous processing, task splitting etc. This is where the JDK executors framework comes in.
After recently digging into the JDK Executor framework and digging deeper into Thread Pools, I learned 2 things that i would like to share:
- defaults suck
- all executors are based on different flavors (aka configurations) of a Thread Pool Executor
Knowing this will allow you to properly configure resources (aka Threads) for task execution. Mis-configurations here can lead to delays, or heavy resource usage causing other processes on your machine to suffer. Also try to know beforehand what the Executor will be used for, how important are the task and delays (if any) are acceptable. Know your Thread Pool, Fool!
Thread Pool Executors
Most people think using a Thread Pool Executor means your task will be done ASAP. WRONG! Most people don’t know that a jdk Thread Pool Executor has a queue (GASP!) and if mis-configured, your thread pool executor can start behaving like a pseudo Queued Executor (aka Single Threaded Executor) where each task is waiting on the one before it. Therefore one needs to understand the ins & outs of Thread Pool Executor.
Each Thread Pool Executor has the following:
Believe it or not, each JDK Thread Pool Executor has a Queue of tasks that the pool of threads is working to empty. The Queue implementation can of many types: fixed size, zero-size or unlimited size. Or as they are called in the javadoc “Bounded queues”, “Direct hand off” or “Unbounded queues”. Each task you give to a Thread Pool is initially put into the Queue for the Thread Pool. The type of queue is important because the Thread Pool is designed to react to not being able to queue a task. A “Direct hand off” is a queue where adding to the queue will always fail. A “Bounded queue” has a fixed size. When the queue is full, adding further items will fail. An “Unbounded queue” is a queue with a very large size so each item is put in. A failure to add a task to the queue is not necessarily a error but a feature. We will come to this soon.
Defaults Suck: Using the default Unbounded Queue and a fast in-rate and a slow out-rate of task will most probably lead to an Out Of Memory in the jvm.
A Pool of 1 or more Threads
Now we come to the Treads. Each Thread Pool Executor has 1 or more threads working to empty the Queue. So a Thread Pool Executor with only 1 Thread is a Queued Executor. Everything will be done in the background by 1 Thread. Having only 1 Thread do the work is great for synchronization because you don’t need any. Everything is done in 1 Thread anyways. As soon as you more than 1 Thread, you have a Pooled Executor with stuff being done by multiple threads. Great for efficiency because you are getting more work done, but bad for synchronization. You need to take care now for thread safety.
Each Thread Pool can be configured with a min and max number of thread. And if max threads > min thread then a new Thread will be created each time queued a task fails to be queued into the above mentioned queue. The min & max number of threads are called in the JDK as ‘Core Pool Size’ and the ‘Maximum Pool Size’ respectively. Also by default most threads have a keep-alive time. If a Thread in the pool is idle for longer than the keep-alive time, it will be removed from the pool. This makes sense because Threads are resources. If they are not being used, then they are released so that maybe someone/something else can create & use them.
We can now also see that a Threaded Executor (an executor which handles each task in a thread) is an executor with a “Direct hand off” queue. This means each tasks’s queuing fails and the Executor tries to use an existing idle Thread or creates a new one. Great for when you need to get shit done ASAP. Not so great when have too much shit. This will lead to, potentially, too many threads created and may crash the jvm. Defaults Suck!
The Rejection Handler
This is the handler that is used to handle tasks that cannot be run. For example if you have a Thread Pool Executor with 2 threads and Direct hand off queue. Then you give it 2 long running tasks that will keep those 2 threads busy for say 5 minutes. In that time you get a third task to execute. Since all threads are busy and this task cannot be queued, it will be passed to the Rejection handler. The JDK provides several strategies. See the javadocs of Thread Pool Executor.
How to not fuck it up
Now that we know how to setup and Thread Pool Executor, here are some ways in which someone can fuck it up:
Few Threads & a big Queue size
Having few threads and a relatively large queue size is one potential misconfiguration. This can lead to the dreaded “pooled executor acting as if it were a queued executor”. For example. Lets say we configure a Thread Pool with min thread 1, max threads 3 and a queue of size 1000. Now imagine we feed 3 tasks that take a LONG time to compute. As the executor has 1 initial thread, this will pick up and start working on the first task. The next 2 tasks will be put into the queue. As queuing did not fail, no new threads are created. Therefore the items in the queue are waiting for the thread to complete the first task. But this task takes a long time. So they have to wait. This leads to delays, which given the context, may not be acceptable in your application. So we now have a Thread Pool Executor behaving as if its a single threaded executor. This is very misleading from a code point-of-view because in the code you see a task given to a thread pool. Looking at the code you would expect that the task is done relatively quickly. But its not because someone fucked up the configuration. Identifying such delays is a pain.
With such a configuration, any time you have N consecutive tasks where N is the max number of threads in your pool, and these N tasks take a relatively long time to compute, subsequent tasks get delayed.
Many Threads and small queue
Having many threads and a small queue is generally a good configuration. But under some circumstances this will also Fail. And this will happen when you have an in-rate larger than the out-rate. This will lead to the queue getting filled up and all threads busy. And when this happens tasks will get rejected in other words they will not be run.
With such a configuration, any time you have N consecutive tasks where N is the max number of threads in your pool, and these N tasks take a relatively long time to compute, subsequent tasks may be dropped.
Some tips & examples
Now I know what you are thinking: How do i know what queue size to choose, and what number of threads to choose so that I don’t fuck it up. Well the answer is: it depends. It depends on:
- the requirements, each task or set of tasks has its own requirements. Find out what they are.
- estimates of in & out rates and estimates of how often and how long is the in rate > out rate condition happens.
Lets look at some examples and how I would guesstimate. In general I try to give the JVM a tick more resources than it needs. I do this because I want to be sure that my jvm can handle my estimated load & some peaks.
A Thread pool to handle Login attempts
This is easy: Threaded. Lets me explain why: Assume i have 2000 users on my system that need to login. I know what the login process is generally quick but from a users/quality point-of-view, it should be completed ASAP. I know over the day i will have approx ~3000 login attempts max (counting also failed logins, etc). I also know that my logins are mostly during the day, and some after lunchtime. So I know that unless someone tries to brute force the password, i would not be wasting resources by having a Threaded Executor. I would use a thread if i have one available, or create a new one to handle a login request. Let the users into the system ASAP.
A Thread Pool to handle changes in account balances
This one is not easy. But lets try anyways. Lets say I run a big-ass bank. All my accounts are in a db. And over the day I get many transaction messages. From these I need to update the account balance in the DB. Lets say i have 2000 customers in my bank. Over the day each customer on average does 20 transactions per day. This makes 40k transactions over the day. Here i would use a thread pool with a relatively large queue size. Also i would ensure that i can replay transactions in case my queue runs full i can replay them. Then I would choose X number of threads where X number of threads would not overload the DB.
I can’t tell how it will be used Executor, but this shit gots to get done!
I would setup montioring of the Executor. I would expose this monitoring via JMX so that i can monitor it. This way I can find out how its being used. If I wanted to be fancy, I would implement self healing. By montoring the Executor and with a custom rejection handler I could give the Executor more Threads to work with. This is doable with the JDK Thread Pool Executors.
So my advice would be to think about the use case, guess possible in/out rates and then decide on a configuration. Defaults from the JDK are only good enough for a toy/sample project.