Recently, my team of programmers and I discovered a minor fault with the application request mechanism. After a few checks, we found the root cause of the fault to be caused by the use of BlockingQueue. Seeing as there are several common BlockingQueues used in Java, such as ArrayBlockingQueue, LinkedBlockingQueue, and SynchronousQueue, I thought it would be useful for other programmers to write a quick article based on the issue we found and how we solved it.
The Fault & Diagnosis
The symptoms of the fault were simple, yet problematic - the thread pool for application processing requests was full so additional requests could not be processed. After dumping the threads we saw the stack of threads were blocked in the log writing area. Thread stacks blocked in the log writing area is not a new problem, but previously we thought this was the result of something else. Therefore, based on past experiences, we assumed the problem was not in the log writing area.
After using many troubleshooting approaches and getting frustrated after not finding the root cause of the issue, we went back to have a look at the source code to try and get some clues as to what was causing the issue. We discovered that the code had a log lock in it. From this log lock, we could see from the thread stack that there was a block in ArrayBlockingQueue.put. After further inspection, we found that it was a 1024 length BlockingQueue. What this means is if this queue has 1024 objects, a subsequent Put Request will be blocked.
The programmer who wrote the code had considered that if and when the BlockingQueue was full, the data should be processed. Here is the code in question:
if (blockingQueue.remainingCapacity() < 1) { //todo } blockingQueue.put
Here, there are two main parts to the problem:
1. A complete judgment goes directly to 'put', rather than 'else'.
2. After the queue is full, the processing logic is still //todo...
The above code shows this particular programmer's lack of familiarity with the BlockingQueue interface. To achieve this result you do not need to first perform this judgment. A better way would be to use blockingQueue.offer. If it returns as ‘false' then you can implement the relevant exception processing.
Types of BlockingQueues
A BlockingQueue is a commonly used data structure in production/consumer mode. The most commonly used types are ArrayBlockingQueue, LinkedBlockingQueue, and SynchronousQueue.
The main difference between ArrayBlockingQueue and LinkedBlockingQueue lies in the objects placed in the queue. One is used for arrays and the other for link tables. The other differences are also given in the code notes:
Linked queues typically have higher throughput than array-based queues but less predictable performance in most concurrent applications.
SynchronousQueue is a special BlockingQueue. It is used during offers. If no other thread is currently performing take or poll, the offer will fail. During take, if no other thread is performing offer concurrently, it will also fail. This special mode is well suited for a queue with high response requirements and threads from a non-fixed thread pool.
Problem Summary
For online business scenarios, there must be a timeout mechanism in all areas where concurrency and external access are blocked. I don't know how many times I have seen a lack of a timeout mechanism being the root cause to serious online business faults.
Online businesses emphasize the fast processing and completion of a request. Therefore fail fast is the most important principle in online business system designs and code programming. According to this principle, the most obvious mistake in the code above is the use of put, rather than offer with a timeout mechanism. Or, you could say that, for unimportant scenarios, offer should be directly used, and a result of ‘false' will directly cause the throwing or recording of an exception.
Concerning the BlockingQueue scenario, in addition to a timeout mechanism, the queue length must be limited. Otherwise, Integer. MAX_VALUE is used by default. In this case, if the code has a bug, the memory will be suspended.
When talking about BlockingQueue, we should also mention the most used area of the BlockingQueue - the thread pool. Java's
ThreadPoolExecutor has the BlockingQueue parameter. If ArrayBlockingQueue or LinkedBlockingQueue is used here and the thread pool's coreSize and poolSize are different, then, after the coreSize threads are occupied, the thread pool will first send an offer to the BlockingQueue. If successful, the process ends.
This scenario however does not always suit the needs of online businesses.
Online businesses operate in a fast paced environment and need fast processing as opposed to placing requests in queues. In fact, it is best for online businesses if requests are not stacked in a queue at all. This sort of online business structure can easily lead to an avalanche and directly rejecting requests that are beyond the processing capabilities of the system and throwing out an error message instead is a relatively simple yet good method. However it is a limitation mechanism.
Remember when writing highly-concurrent and distributed code that it is not all just about system design, it’s important to pay attention to all the small details in your code. That’s how you can avoid problems like the one mentioned above.