(For more resources related to this topic, see here.)

Threading performance issues

Threading performance issues are the issues related to concurrency, as follows:

Lack of threading or excessive threading
Threads blocking up to starvation (usually from competing on shared resources)
Deadlock until the complete application hangs (threads waiting for each other)

Memory performance issues

Memory performance issues are the issues that are related to application memory management, as follows:

Memory leakage: This issue is an explicit leakage or implicit leakage as seen in improper hashing
Improper caching: This issue is due to over caching, inadequate size of the object, or missing essential caching
Insufficient memory allocation: This issue is due to missing JVM memory tuning

Algorithmic performance issues

Implementing the application logic requires two important parameters that are related to each other; correctness and optimization. If the logic is not optimized, we have algorithmic issues, as follows:

Costive algorithmic logic
Unnecessary logic

Work as designed performance issues

The work as designed performance issue is a group of issues related to the application design. The application behaves exactly as designed but if the design has issues, it will lead to performance issues. Some examples of performance issues are as follows:

Using synchronous when asynchronous should be used
Neglecting remoteness, that is, using remote calls as if they are local calls
Improper loading technique, that is, eager versus lazy loading techniques
Selection of the size of the object
Excessive serialization layers
Web services granularity
Too much synchronization
Non-scalable architecture, especially in the integration layer or middleware
Saturated hardware on a shared infrastructure

Interfacing performance issues

Whenever the application is dealing with resources, we may face the following interfacing issues that could impact our application performance:

Using an old driver/library
Missing frequent database housekeeping
Database issues, such as, missing database indexes
Low performing JMS or integration service bus
Logging issues (excessive logging or not following the best practices while logging)
Network component issues, that is, load balancer, proxy, firewall, and so on

Miscellaneous performance issues

Miscellaneous performance issues include different performance issues, as follows:

Inconsistent performance of application components, for example, having slow components can cause the whole application to slow down
Introduced performance issues to delay the processing speed
Improper configuration tuning of different components, for example, JVM, application server, and so on
Application-specific performance issues, such as excessive validations, apply many business rules, and so on

Fake performance issues

Fake performance issues could be a temporary issue or not even an issue. The famous examples are as follows:

Networking temporary issues
Scheduled running jobs (detected from the associated pattern)
Software automatic updates (it must be disabled in production)
Non-reproducible issues

In the following sections, we will go through some of the listed issues.

Threading performance issues

Multithreading has the advantage of maximizing the hardware utilization. In particular, it maximizes the processing power by executing multiple tasks concurrently. But it has different side effects, especially if not used wisely inside the application.

For example, in order to distribute tasks among different concurrent threads, there should be no or minimal data dependency, so each thread can complete its task without waiting for other threads to finish. Also, they shouldn't compete over different shared resources or they will be blocked, waiting for each other. We will discuss some of the common threading issues in the next section.

Blocking threads

A common issue where threads are blocked is waiting to obtain the monitor(s) of certain shared resources (objects), that is, holding by other threads. If most of the application server threads are consumed in a certain blocked status, the application becomes gradually unresponsive to user requests.

In the Weblogic application server, if a thread keeps executing for more than a configurable period of time (not idle), it gets promoted to the Stuck thread. The more the threads are in the stuck status, the more the server status becomes critical. Configuring the stuck thread parameters is part of the Weblogic performance tuning.

Performance symptoms

The following symptoms are the performance symptoms that usually appear in cases of thread blocking:

Slow application response (increased single request latency and pending user requests)
Application server logs might show some stuck threads.
The server's healthy status becomes critical on monitoring tools (application server console or different monitoring tools)
Frequent application server restarts either manually or automatically
Thread dump shows a lot of threads in the blocked status waiting for different resources
Application profiling shows a lot of thread blocking

An example of thread blocking

To understand the effect of thread blocking on application execution, open the HighCPU project and measure the time it takes for execution by adding the following additional lines:

long start= new Date().getTime();
..
..
long duration= new Date().getTime()-start;
System.err.println("total time = "+duration);

Now, try to execute the code with a different number of the thread pool size. We can try using the thread pool size as 50 and 5, and compare the results. In our results, the execution of the application with 5 threads is much faster than 50 threads!

Let's now compare the NetBeans profiling results of both the executions to understand the reason behind this unexpected difference.

The following screenshot shows the profiling of 50 threads; we can see a lot of blocking for the monitor in the column and the percentage of Monitor to the left waiting around at 75 percent:

common-performance-issues-img-0

To get the preceding profiling screen, click on the Profile menu inside NetBeans, and then click on Profile Project (HighCPU). From the pop-up options, select Monitor and check all the available options, and then click on Run.

The following screenshot shows the profiling of 5 threads, where there is almost no blocking, that is, less threads compete on these resources:

common-performance-issues-img-1

Try to remove the System.out statement from inside the run() method, re-execute the tests, and compare the results.

Another factor that also affects the selection of the pool size, especially when the thread execution takes long time, is the context switching overhead. This overhead requires the selection of the optimal pool size, usually related to the number of available processors for our application.

Context switching is the CPU switching from one process (or thread) to another, which requires restoration of the execution data (different CPU registers and program counters). The context switching includes suspension of the current executing process, storing its current data, picking up the next process for execution according to its priority, and restoring its data.

Although it's supported on the hardware level and is faster, most operating systems do this on the level of software context switching to improve the performance. The main reason behind this is the ability of the software context switching to selectively choose the required registers to save.

Thread deadlock

When many threads hold the monitor for objects that they need, this will result in a deadlock unless the implementation uses the new explicit Lock interface. In the example, we had a deadlock caused by two different threads waiting to obtain the monitor that the other thread held.

The thread profiling will show these threads in a continuous blocking status, waiting for the monitors. All threads that go into the deadlock status become out of service for the user's requests, as shown in the following screenshot:

common-performance-issues-img-2

Usually, this happens if the order of obtaining the locks is not planned. For example, if we need to have a quick and easy fix for a multidirectional thread deadlock, we can always lock the smallest or the largest bank account first, regardless of the transfer direction. This will prevent any deadlock from happening in our simple two-threaded mode. But if we have more threads, we need to have a much more mature way to handle this by using the Lock interface or some other technique.

Memory performance issues

In spite of all this great effort put into the allocated and free memory in an optimized way, we still see memory issues in Java Enterprise applications mainly due to the way people are dealing with memory in these applications.

We will discuss mainly three types of memory issues: memory leakage, memory allocation, and application data caching.

Memory leakage

Memory leakage is a common performance issue where the garbage collector is not at fault; it is mainly the design/coding issues where the object is no longer required but it remains referenced in the heap, so the garbage collector can't reclaim its space. If this is repeated with different objects over a long period (according to object size and involved scenarios), it may lead to an out of memory error.

The most common example of memory leakage is adding objects to the static collections (or an instance collection of long living objects, such as a servlet) and forgetting to clean collections totally or partially.

Performance symptoms

The following symptoms are some of the expected performance symptoms during a memory leakage in our application:

The application uses heap memory increased by time
The response slows down gradually due to memory congestion
OutOfMemoryError occurs frequently in the logs and sometimes an application server restart is required
Aggressive execution of garbage collection activities
Heap dump shows a lot of objects retained (from the leakage types)
A sudden increase of memory paging as reported by the operating system monitoring tools

An example of memory leakage

We have a sample application ExampleTwo; this is a product catalog where users can select products and add them to the basket. The application is written in spaghetti code, so it has a lot of issues, including bad design, improper object scopes, bad caching, and memory leakage. The following screenshot shows the product catalog browser page:

common-performance-issues-img-3

One of the bad issues is the usage of the servlet instance (or static members), as it causes a lot of issues in multiple threads and has a common location for unnoticed memory leakages.

We have added the following instance variable as a leakage location:

private final HashMap<String, HashMap> cachingAllUsersCollection = 
  new HashMap();

We will add some collections to the preceding code to cause memory leakage. We also used the caching in the session scope, which causes implicit leakage. The session scope leakage is difficult to diagnose, as it follows the session life cycle. Once the session is destroyed, the leakage stops, so we can say it is less severe but more difficult to catch.

Adding global elements, such as a catalog or stock levels, to the session scope has no meaning. The session scope should only be restricted to the user-specific data. Also, forgetting to remove data that is not required from a session makes the memory utilization worse. Refer to the following code:

@Stateful
public class CacheSessionBean

Instead of using a singleton class here or stateless bean with a static member, we used the Stateful bean, so it is instantiated per user session. We used JPA beans in the application layers instead of using View Objects. We also used loops over collections instead of querying or retrieving the required object directly, and so on.

It would be good to troubleshoot this application with different profiling aspects to fix all these issues. All these factors are enough to describe such a project as spaghetti.

We can use our knowledge in Apache JMeter to develop simple testing scenarios. As shown in the following screenshot, the scenario consists of catalog navigations and details of adding some products to the basket:

common-performance-issues-img-4

Executing the test plan using many concurrent users over many iterations will show the bad behavior of our application, where the used memory is increased by time. There is no justification as the catalog is the same for all users and there's no specific user data, except for the IDs of the selected products. Actually, it needs to be saved inside the user session, which won't take any remarkable memory space.

In our example, we intend to save a lot of objects in the session, implement a wrong session level, cache, and implement meaningless servlet level caching. All this will contribute to memory leakage. This gradual increase in the memory consumption is what we need to spot in our environment as early as possible (as we can see in the following screenshot, the memory consumption in our application is approaching 200 MB!):

common-performance-issues-img-5

Improper data caching

Caching is one of the critical components in the enterprise application architecture. It increases the application performance by decreasing the time required to query the object again from its data store, but it also complicates the application design and causes a lot of other secondary issues.

The main concerns in the cache implementation are caching refresh rate, caching invalidation policy, data inconsistency in a distributed environment, locking issues while waiting to obtain the cached object's lock, and so on.

Improper caching issue types

The improper caching issue can take a lot of different variants. We will pick some of them and discuss them in the following sections.

No caching (disabled caching)

Disabled caching will definitely cause a big load over the interfacing resources (for example, database) by hitting it in with almost every interaction. This should be avoided while designing an enterprise application; otherwise; the application won't be usable.

Fortunately, this has less impact than using wrong caching implementation!

Most of the application components such as database, JPA, and application servers already have an out-of-the-box caching support.

Too small caching size

Too small caching size is a common performance issue, where the cache size is initially determined but doesn't get reviewed with the increase of the application data. The cache sizing is affected by many factors such as the memory size. If it allows more caching and the type of the data, lookup data should be cached entirely when possible, while transactional data shouldn't be cached unless required under a very strict locking mechanism.

Also, the cache replacement policy and invalidation play an important role and should be tailored according to the application's needs, for example, least frequently used, least recently used, most frequently used, and so on.

As a general rule, the bigger the cache size, the higher the cache hit rate and the lower the cache miss ratio. Also, the proper replacement policy contributes here; if we are working—as in our example—on an online product catalog, we may use the least recently used policy so all the old products will be removed, which makes sense as the users usually look for the new products.

Monitoring of the caching utilization periodically is an essential proactive measure to catch any deviations early and adjust the cache size according to the monitoring results. For example, if the cache saturation is more than 90 percent and the missed cache ratio is high, a cache resizing is required.

Missed cache hits are very costive as they hit the cache first and then the resource itself (for example, database) to get the required object, and then add this loaded object into the cache again by releasing another object (if the cache is 100 percent), according to the used cache replacement policy.

Too big caching size

Too big caching size might cause memory issues. If there is no control over the cache size and it keeps growing, and if it is a Java cache, the garbage collector will consume a lot of time trying to garbage collect that huge memory, aiming to free some memory. This will increase the garbage collection pause time and decrease the cache throughput.

If the cache throughput is decreased, the latency to get objects from the cache will increase causing the cache retrieval cost to be high to the level it might be slower than hitting the actual resources (for example, database).

Using the wrong caching policy

Each application's cache implementation should be tailored according to the application's needs and data types (transactional versus lookup data). If the selection of the caching policy is wrong, the cache will affect the application performance rather than improving it.

Performance symptoms

According to the cache issue type and different cache configurations, we will see the following symptoms:

Decreased cache hit rate (and increased cache missed ratio)
Increased cache loading because of the improper size
Increased cache latency with a huge caching size
Spiky pattern in the performance testing response time, in case the cache size is not correct, causes continuous invalidation and reloading of the cached objects

An example of improper caching techniques

In our example, ExampleTwo, we have demonstrated many caching issues, such as no policy defined, global cache is wrong, local cache is improper, and no cache invalidation is implemented. So, we can have stale objects inside the cache.

Cache invalidation is the process of refreshing or updating the existing object inside the cache or simply removing it from the cache. So in the next load, it reflects its recent values. This is to keep the cached objects always updated.

Cache hit rate is the rate or ratio in which cache hits match (finds) the required cached object. It is the main measure for cache effectiveness together with the retrieval cost.

Cache miss rate is the rate or ratio at which the cache hits the required object that is not found in the cache.

Last access time is the timestamp of the last access (successful hit) to the cached objects.

Caching replacement policies or algorithms are algorithms implemented by a cache to replace the existing cached objects with other new objects when there are no rooms available for any additional objects. This follows missed cache hits for these objects. Some examples of these policies are as follows:

First-in-first-out (FIFO): In this policy, the cached object is aged and the oldest object is removed in favor of the new added ones.
Least frequently used (LFU): In this policy, the cache picks the less frequently used object to free the memory, which means the cache will record statistics against each cached object.
Least recently used (LRU): In this policy, the cache replaces the least recently accessed or used items; this means the cache will keep information like the last access time of all cached objects.
Most recently used (MRU): This policy is the opposite of the previous one; it removes the most recently used items. This policy fits the application where items are no longer needed after the access, such as used exam vouchers.
Aging policy: Every object in the cache will have an age limit, and once it exceeds this limit, it will be removed from the cache in the simple type. In the advanced type, it will also consider the invalidation of the cache according to predefined configuration rules, for example, every three hours, and so on.

It is important for us to understand that caching is not our magic bullet and it has a lot of related issues and drawbacks. Sometimes, it causes overhead if not correctly tailored according to real application needs.