Infinispan Cluster Manager
This is a cluster manager implementation for Vert.x that uses Infinispan.
This implementation is packaged inside:
<dependency>
<groupId>io.vertx</groupId>
<artifactId>vertx-infinispan</artifactId>
<version>3.9.2</version>
</dependency>
In Vert.x a cluster manager is used for various functions including:
-
Discovery and group membership of Vert.x nodes in a cluster
-
Maintaining cluster wide topic subscriber lists (so we know which nodes are interested in which event bus addresses)
-
Distributed Map support
-
Distributed Locks
-
Distributed Counters
Cluster managers do not handle the event bus inter-node transport, this is done directly by Vert.x with TCP connections.
Using this cluster manager
If you are using Vert.x from the command line, the jar corresponding to this cluster manager (it will be named vertx-infinispan-3.9.2.jar
should be in the lib
directory of the Vert.x installation.
If you want clustering with this cluster manager in your Vert.x Maven or Gradle project then just add a dependency to
the artifact: io.vertx:vertx-infinispan:3.9.2
in your project.
If the jar is on your classpath as above then Vert.x will automatically detect this and use it as the cluster manager. Please make sure you don’t have any other cluster managers on your classpath or Vert.x might choose the wrong one.
You can also specify the cluster manager programmatically if you are embedding Vert.x by specifying it on the options when you are creating your Vert.x instance, for example:
ClusterManager mgr = new InfinispanClusterManager();
VertxOptions options = new VertxOptions().setClusterManager(mgr);
Vertx.clusteredVertx(options, res -> {
if (res.succeeded()) {
Vertx vertx = res.result();
} else {
// failed!
}
});
Configuring this cluster manager
The default cluster manager configuration can be modified with infinispan.xml
and/or jgroups.xml
files.
The former configures the data grid, the latter group management and member discovery.
You can place one or both of them on your classpath. If you want to embed your custom file in a fat jar, it must be located at the root of the fat jar. If it’s an external file, the directory containing the file must be added to the classpath. For example, if you are using the launcher class from Vert.x, the classpath enhancement can be done as follows:
# If infinispan.xml and/or jgroups.xml files are in the current directory:
java -jar my-app.jar -cp . -cluster
# If infinispan.xml and/or jgroups.xml files are in the conf directory:
java -jar my-app.jar -cp conf -cluster
Another way to override the configuration is by providing the file locations via system properties:
vertx.infinispan.config
and/or vertx.jgroups.config
.
# Use a cluster configuration located in an external file
java -Dvertx.infinispan.config=./config/my-infinispan.xml -jar ... -cluster
# Or use a custom configuration from the classpath
java -Dvertx.infinispan.config=my/package/config/my-infinispan.xml -jar ... -cluster
The cluster manager will search for the file in classpath first, and fallback to the filesystem.
The system properties, when present, override any infinispan.xml
or jgroups.xml
on the classpath.
The xml files are Infinispan and JGroups configuration files and are described in detail in the documentation on the Infinispan and JGroups web-sites.
Important
|
if a jgroups.xml file is on the classpath or if you set the vertx.jgroups.config system property,
it will override any JGroups stack-file path defined in the Infinispan configuration file.
|
The default JGroups configuration uses multicast for discovery and TCP for group management. Make sure multicast is enabled on your network for this to work.
For full documentation on how to configure the transport differently or use a different transport please consult the Infinispan / JGroups documentations.
Using an existing Infinispan Cache Manager
You can pass an existing DefaultCacheManager
in the cluster manager to reuse an existing cache manager:
ClusterManager mgr = new InfinispanClusterManager(cacheManager);
VertxOptions options = new VertxOptions().setClusterManager(mgr);
Vertx.clusteredVertx(options, res -> {
if (res.succeeded()) {
Vertx vertx = res.result();
} else {
// failed!
}
});
In this case, vert.x is not the cache manager owner and so do not shut it down on close.
Notice that the custom Infinispan instance need to be configured with:
<cache-container default-cache="distributed-cache">
<distributed-cache name="distributed-cache"/>
<distributed-cache name="__vertx.subs"/>
<replicated-cache name="__vertx.haInfo"/>
<distributed-cache-configuration name="__vertx.distributed.cache.configuration"/>
</cache-container>
Configuring for Kubernetes
On Kubernetes, JGroups should be configured to use the KUBE_PING
protocol.
First, add the org.infinispan:infinispan-cloud:9.4.10.Final
dependency to your project.
With Maven it looks like:
<dependency>
<groupId>org.infinispan</groupId>
<artifactId>infinispan-cloud</artifactId>
<version>9.4.10.Final</version>
</dependency>
Then, set the vertx.jgroups.config
system property to default-configs/default-jgroups-kubernetes.xml
.
-Dvertx.jgroups.config=default-configs/default-jgroups-kubernetes.xml
This JGroups stack file is located in the infinispan-cloud
JAR and preconfigured for Kubernetes.
Also, set the project namespace as the scope for discovery.
ENV KUBERNETES_NAMESPACE my-project
Optionnaly, to create separate clusters in the same namespace, add a labels selector:
ENV KUBERNETES_LABELS my-label=my-value
Then, force usage of IPv4 in the JVM with a system property.
-Djava.net.preferIPv4Stack=true
Eventually, the setup needs a service account.
oc policy add-role-to-user view system:serviceaccount:$(oc project -q):default -n $(oc project -q)
Further configuration details are available on the Kubernetes discovery protocol for JGroups repository.
Rolling updates
During rolling udpates, the Infinispan team recommends to replace pods one by one.
To do so, we must configure Kubernetes to:
-
never start more than one new pod at once
-
forbid more than one unavailable pod during the process
spec:
strategy:
type: Rolling
rollingParams:
updatePeriodSeconds: 10
intervalSeconds: 20
timeoutSeconds: 600
maxUnavailable: 1 (1)
maxSurge: 1 (2)
-
the maximum number of pods that can be unavailable during the update process
-
the maximum number of pods that can be created over the desired number of pods
Also, pod readiness probe must take cluster health into account. Please refer to the cluster administration section for details on how to implement a readiness probe with Vert.x Health Checks.
Configuring for Docker Compose
Make sure to start the Java Virtual Machines with those system properties:
-Djava.net.preferIPv4Stack=true -Djgroups.tcp.address=NON_LOOPBACK
This will make JGroups pick the interface of the virtual private network created by Docker.
Trouble shooting clustering
If the default multicast discovery configuration is not working here are some common causes:
Multicast not enabled on the machine.
It is quite common in particular on OSX machines for multicast to be disabled by default. Please google for information on how to enable this.
Using wrong network interface
If you have more than one network interface on your machine (and this can also be the case if you are running VPN software on your machine), then JGroups may be using the wrong one.
To tell JGroups to use a specific interface you can provide the IP address of the interface in the bind_addr
element of the configuration. For example:
<TCP bind_addr="192.168.1.20"
...
/>
<MPING bind_addr="192.168.1.20"
...
/>
Alternatively, if you want to stick with the bundled jgroups.xml
file, you can set the jgroups.tcp.address
system property:
-Djgroups.tcp.address=192.168.1.20
When running Vert.x is in clustered mode, you should also make sure that Vert.x knows about the correct interface.
When running at the command line this is done by specifying the cluster-host
option:
vertx run myverticle.js -cluster -cluster-host your-ip-address
Where your-ip-address
is the same IP address you specified in the JGroups configuration.
If using Vert.x programmatically you can specify this using
setClusterHost
.
Using a VPN
This is a variation of the above case. VPN software often works by creating a virtual network interface which often doesn’t support multicast. If you have a VPN running and you do not specify the correct interface to use in both the JGroups configuration and to Vert.x then the VPN interface may be chosen instead of the correct interface.
So, if you have a VPN running you may have to configure both JGroups and Vert.x to use the correct interface as described in the previous section.
When multicast is not available
In some cases you may not be able to use multicast discovery as it might not be available in your environment. In that case
you should configure another protocol, e.g. TCPPING
to use TCP sockets, or S3_PING
when running on Amazon EC2.
For more information on available JGroups discovery protocols and how to configure them please consult the JGroups documentation.
Problems with IPv6
If you have troubles configuring an IPv6 host, force the use of IPv4 with the java.net.preferIPv4Stack
system property.
-Djava.net.preferIPv4Stack=true
Enabling logging
When trouble-shooting clustering issues with it’s often useful to get some logging output from Infinispan and JGroups
to see if it’s forming a cluster properly. You can do this (when using the default JUL logging) by adding a file
called vertx-default-jul-logging.properties
on your classpath. This is a standard java.util.logging (JUL)
configuration file. Inside it set:
org.infinispan.level=INFO org.jgroups.level=INFO
and also
java.util.logging.ConsoleHandler.level=INFO java.util.logging.FileHandler.level=INFO
Infinispan logging
Infinispan relies on JBoss logging. JBoss Logging is a logging bridge providing integration with numerous logging frameworks.
Add the logging JARs of you choice to the classpath and JBoss Logging will pick them up automatically.
If you have multiple logging backends on your classpath, you can force selection with the org.jboss.logging.provider
system property.
For exeample:
-Dorg.jboss.logging.provider=log4j2
See this JBoss Logging guide for more details.
JGroups logging
JGroups uses JDK logging by default. log4j and log4j2 are supported if the corresponding JARs are found on the classpath.
Please refer to the JGroups logging documentation if you need more details or want to implement your own logging backend implementation.
SharedData extensions
AsyncMap content streams
The InfinispanAsyncMap
API allows to retrieve keys, values and entries as streams.
This can be useful if you need to go through the content of a large map for bulk processing.
InfinispanAsyncMap<K, V> infinispanAsyncMap = InfinispanAsyncMap.unwrap(asyncMap);
ReadStream<K> keyStream = infinispanAsyncMap.keyStream();
ReadStream<V> valueStream = infinispanAsyncMap.valueStream();
ReadStream<Map.Entry<K, V>> entryReadStream = infinispanAsyncMap.entryStream();
Cluster administration
The Infinispan cluster manager works by turning Vert.x nodes into members of an Infinispan cluster. As a consequence, Vert.x cluster manager administration should follow the Infinispan management guidelines.
First, let’s take a step back and introduce rebalancing and split-brain syndrome.
Rebalancing
Each Vert.x node holds pieces of the clustering data: eventbus subscriptions, async map entries, clustered counters… etc.
When a member joins or leaves the cluster, Infinispan rebalances cache entries on the new set of members. In other words, it moves data around to accomodate the new cluster topology. This process may take some time, depending on the amount of clustered data and number of nodes.
Split-brain syndrome
In a perfect world, there would be no network equipment failures. Reality is, though, that sooner or later your cluster will be divided into smaller groups, unable to see each others.
Infinispan is capable of merging the nodes back into a single cluster. But just as with rebalancing, this process may take some time. Before the cluster is fully functional again, some eventbus consumers might not be able to get messages. Or high-availability may not be able to redeploy a failing verticle.
Note
|
It is difficult (if possible at all) to make a difference between a network partition and:
|
Recommendations
Considering the common clustering issues discussed above, it is recommended to stick to the following good practices.
Graceful shutdown
Avoid stopping members forcefully (e.g, kill -9
a node).
Of course process crashes are inevitable, but a graceful shutdown helps to get the remaining nodes in a stable state faster.
One node after the other
When rolling a new version of your app, scaling-up or down your cluster, add or remove nodes one after the other.
Stopping nodes one by one prevents the cluster from thinking a network partition occured. Adding them one by one allows for clean, incremental rebalancing operations.
The cluster healthiness can be verified with Vert.x Health Checks:
Handler<Promise<Status>> procedure = ClusterHealthCheck.createProcedure(vertx, true);
HealthChecks checks = HealthChecks.create(vertx).register("cluster-health", procedure);
After creation, the health check can be exposed over HTTP with a Vert.x Web router handler:
Router router = Router.router(vertx);
router.get("/readiness").handler(HealthCheckHandler.createWithHealthChecks(checks));