Introduction to JBoss Clustering

Clustering plays an important role in Enterprise applications as it lets you split the load of your application across several nodes, granting robustness to your applications. As we discussed earlier, for optimal results it's better to limit the size of your JVM to a maximum of 2-2.5GB, otherwise the dynamics of the garbage collector will decrease your application's performance.

Combining relatively smaller Java heaps with a solid clustering configuration can lead to a better, scalable configuration plus significant hardware savings.

The only drawback to scaling out your applications is an increased complexity in the programming model, which needs to be correctly understood by aspiring architects.

JBoss AS comes out of the box with clustering support. There is no all-in-one library that deals with clustering but rather a set of libraries, which cover different kinds of aspects. The following picture shows how these libraries are arranged:

introduction-jboss-clustering-img-0

The backbone of JBoss Clustering is the JGroups library, which provides the communication between members of the cluster. Built upon JGroups we meet two building blocks, the JBoss Cache framework and the HAPartition service.

JBoss Cache handles the consistency of your application across the cluster by means of a replicated and transactional cache.

On the other hand, HAPartition is an abstraction built on top of a JGroups Channel that provides support for making and receiving RPC invocations from one or more cluster members. For example HA-JNDI (High Availability JNDI) or HA Singleton (High Availability Singleton) both use HAPartition to share a single Channel and multiplex RPC invocations over it, eliminating the configuration complexity and runtime overhead of having each service create its own Channel. If you need more information about the HAPartition service you can consult the JBoss AS documentation https://developer.jboss.org/wiki/jBossAS5ClusteringGuide.

In the next section we will learn more about the JGroups library and how to configure it to reach the best performance for clustering communication.

Configuring JGroups transport

Clustering requires communication between nodes to synchronize the state of running applications or to notify changes in the cluster definition.

JGroups (http://jgroups.org/manual/html/index.html) is a reliable group communication toolkit written entirely in Java. It is based on IP multicast, but extends by providing reliability and group membership.

Member processes of a group can be located on the same host, within the same Local Area Network (LAN), or across a Wide Area Network (WAN). A member can be in turn part of multiple groups.

The following picture illustrates a detailed view of JGroups architecture:

introduction-jboss-clustering-img-1

A JGroups process consists basically of three parts, namely the Channel, Building blocks, and the Protocol stack. The Channel is a simple socket-like interface used by application programmers to build reliable group communication applications. Building blocks are an abstraction interface layered on top of Channels, which can be used instead of Channels whenever a higher-level interface is required. Finally we have the Protocol stack, which implements the properties specified for a given channel.

In theory, you could configure every service to bind to a different Channel. However this would require a complex thread infrastructure with too many thread context switches. For this reason, JBoss AS is configured by default to use a single Channel to multiplex all the traffic across the cluster.

The Protocol stack contains a number of layers in a bi-directional list. All messages sent and received over the channel have to pass through all protocols. Every layer may modify, reorder, pass or drop a message, or add a header to a message. A fragmentation layer might break up a message into several smaller messages, adding a header with an ID to each fragment, and re-assemble the fragments on the receiver's side.

The composition of the Protocol stack (that is, its layers) is determined by the creator of the channel: an XML file defines the layers to be used (and the parameters for each layer).

Knowledge about the Protocol stack is not necessary when just using Channels in an application. However, when an application wishes to ignore the default properties for a Protocol stack, and configure their own stack, then knowledge about what the individual layers are supposed to do is needed.

In JBoss AS, the configuration of the Protocol stack is located in the file, <server> deployclusterjgroups-channelfactory.sarMETA-INFjgroupschannelfactory- stacks.xml.

The file is quite large to fit here, however, in a nutshell, it contains the following basic elements:

introduction-jboss-clustering-img-2

The first part of the file includes the UDP transport configuration. UDP is the default protocol for JGroups and uses multicast (or, if not available, multiple unicast messages) to send and receive messages.

A multicast UDP socket can send and receive datagrams from multiple clients. The interesting and useful feature of multicast is that a client can contact multiple servers with a single packet, without knowing the specific IP address of any of the hosts.

Next to the UDP transport configuration, three protocol stacks are defined:

udp: The default IP multicast based stack, with flow control

udp-async: The protocol stack optimized for high-volume asynchronous RPCs

udp-sync: The stack optimized for low-volume synchronous RPCs

Thereafter, the TCP transport configuration is defined . TCP stacks are typically used when IP multicasting cannot be used in a network (for example, because it is disabled) or because you want to create a network over a WAN (that's conceivably possible but sharing data across remote geographical sites is a scary option from the performance point of view).

You can opt for two TCP protocol stacks:

tcp: Addresses the default TCP Protocol stack which is best suited to high-volume asynchronous calls.

tcp-async: Addresses the TCP Protocol stack which can be used for low-volume synchronous calls.

If you need to switch to TCP stack, you can simply include the following in your command line args that you pass to JBoss:
-Djboss.default.jgroups.stack=tcp Since you are not using multicast in your TCP communication, this requires configuring the addresses/ports of all the possible nodes in the cluster. You can do this by using the property -Djgroups.tcpping. initial_hosts. For example:
-Djgroups.tcpping.initial_hosts=host1[7600],host2[7600]

Ultimately, the configuration file contains two stacks which can be used for optimising JBoss Messaging Control Channel (jbm-control) and Data Channel (jbm-data).