10.8:documentation:caching

Caching

Caching mechanisms are used to alleviate some load from the database in situations, in which the system does a lot of database querying with expected same results. For instance, in CzechIdM it is used for configuration, scripts and other similar places. There are some things, that we need to take into consideration when we are dealing with caching in CzechIdM.

Currently active caches are displayed in Configuration → Modules → Cache. Here we can see all caches, which have been initialized since the start of the application. Note that cache is initialized lazily (when they are needed) so not all expected caches may be displayed in this table.

Cache eviction is a process, during which all entries in the given cache are deleted.

Most operations, which need to invalidate the cache, do so automatically and the administrator does not need to worry about it. For example when you change the configuration of CzechIdM from UI, then the corresponding cached value is updated too (if it exists). There are basically only two situations, which would require manual cache eviction:

  • After direct modification of cached data in the database (for example after ETL operation)
  • When cached data gets too big

You can either evict specific caches, or you can evict all currently initialized caches from the same page on which all caches are displayed (Configuration → Modules → Cache).

You may be surprised that some caches are not empty after eviction (mainly configuration cache). This is due to the fact that they are reinitialized right away, hence this behavior is to be expected.

CzechIdM comes with pre-configured Ehcache 8.3.1 and is able to connect to the Terracota server in order to enable distributed caching. If you need to use some other cache provider, you need to do a little bit of implementation, mainly to configure CacheManager, which will be used in CzechIdM. We use JCache https://www.jcp.org/en/jsr/detail?id=107 JSR-107 compatible CacheManager so if you decide to use a cache provider which is compatible with this JSR, you only need to provide provider-specific configuration, all other stuff will be taken care of for you (more on that later).

There are some caches in CzechIdM which require shared cache in order to make CzechIdM work properly in a multi-instance environment. Configuration cache and provisioning brake cache to name a few. Other caches can be stored locally and some even must be stored locally. This is particularly the case with caches, that are used by instance-specific task executors, such as in the case of synchronization of a system. If such a cache would be shared among multiple instances, you may experience some weird behavior.

In order to prevent such things, there are some caches, which will always be stored exclusively locally, even if you configure distributed caching via the Terracota server (or any other server for that matter). There is also a technical reason to cache some data only locally, which will be discussed later.

By default, if you start CzechIdM with no special configuration, it will start with on-heap local-only Ehcache with default cache size (2000 entries per cache). Configuring the on-heap cache size may be needed if you experience some performance issues because Ehcache will overflow on disk, which may be slow. In general, however, you won't need to tweak this setting, because default on-heap cache size is big enough for ordinary usage of CzechIdM. This is also the reason, why there is no configuration property for setting specific cache size as each cache is configured to work properly by default. If for some reason you still need to change it, you can find out how in our cache developer guide.

You can download the Ehcache clustering kit here https://github.com/ehcache/ehcache3/releases/download/v3.8.1/ehcache-clustered-3.8.1-kit.zip. After extracting the downloaded archive, you should navigate to server/conf to tweak terracotta server configuration and to server/bin/ to start the server.

Example of CzechIdM configuration for the Terracota server on the localhost.

cache.terracota.url=localhost:9410
cache.terracota.resource.name=main
cache.terracota.resource.pool.name=resource-pool
# Size in MB
cache.terracota.resource.pool.size=32

and compatible Terracota server config

<?xml version="1.0" encoding="UTF-8"?>
<tc-config xmlns="http://www.terracotta.org/config"
           xmlns:ohr="http://www.terracotta.org/config/offheap-resource"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
           xsi:schemaLocation="http://www.terracotta.org/schema/terracotta-4.xsd">
  <!--
    This is the default Terracotta server configuration file for the Ehcache kit.

    It defines a single offheap resource of 512MB to be used for caching data.

    It also defines a single server, but you can add another one to benefit from high availability.
  -->

  <plugins>
    <config>
      <ohr:offheap-resources>
        <ohr:resource name="main" unit="MB">512</ohr:resource>
      </ohr:offheap-resources>
    </config>
  </plugins>

  <servers>
    <server host="localhost" name="default-server">
      <!--
        Indicates the location for logs files - %H will resolve to user home directory.
        Note that relative path will be resolved from the location of this configuration file.
       -->
      <logs>%H/terracotta-logs</logs>

      <!--
        This port is used by clients to communicate to the server.
        Its value is actually the default one and is thus omitted.
      -->
      <!--<tsa-port>9410</tsa-port>-->

      <!--
        This port is used for server to server communication.
        Its value is actually the default one and is thus omitted.
      -->
      <!--<tsa-group-port>9430</tsa-group-port>-->
    </server>

    <!--
      Below a sample server definition that will give HA to the cluster if run on a different host.

      Servers know how to communicate between each others because each version of the config file on each host
      will list all servers in it.
    -->
    <!--<server host="otherhost" name="other-server">-->
    <!--<logs>logs</logs>-->
    <!--<tsa-port>9410</tsa-port>-->
    <!--</server>-->

    <!--
      Indicates how much time a server taking over after a failure in an active will wait
      to allow existing clients to reconnect. The time unit is seconds.
    -->
    <client-reconnect-window>120</client-reconnect-window>
  </servers>
  <failover-priority>
    <availability/>
  </failover-priority>
</tc-config>

If you experience errors while running this on java 8 on Windows, try to switch to Java 11 or later. We haven't found any issues with java 8 on Linux.

This part fo the CzechIdM cache documentation focuses on explaining how to define your own caches and how to use them.

The matter of configuring cache in CzechIdM is the matter of creating a Spring bean of type eu.bcvsolutions.idm.core.api.config.cache.IdmCacheCOnfiguration. There are currently two implementations of this interface, LocalIdMCacheConfiguration for creating on-heap caches and DistributedIdMCacheConfiguration to define clustered cache. Each of these implementations comes with a convenient builder, which will help you create a compatible configuration.

Here is an example of creating on-heap cache

@Configuration
public class ExampleCacheConfiguration {

	@Bean
	public IdMCacheConfiguration exampleCache() {
		return LocalIdMCacheConfiguration.<String, Object> builder()
				.withName("example:MyCache")
				.withKeyType(String.class)
				.withValueType(Object.class)
				.build();
	}

}

Distributed caches are created in a similar fashion as on-heap caches. Here is an example.

@Configuration
public class ExampleCacheConfiguration {

	@Bean
	public IdMCacheConfiguration exampleCache() {
		return DistributedIdMCacheConfiguration.<String, String> builder()
				.withName("example:MyCache")
				.withKeyType(String.class)
				.withValueType(String.class)
				.build();
	}

}

Here is a list of some other things about caches, which are good to know

  • LocalIdMCacheConfiguration supports all value types, DistributedIdMCacheConfiguration values and keys must be Serializable
  • Even if you define cache as Distributed, it can be stored on a heap in case there is no clustered server defined
  • All local caches have default size set to 2000 entries per cache. You can override it by calling withSize(int) method on a cache configuration
  • Distributed cache size is typically defined on a server, therefore it is not CzechIdM's concern.
  • If you define property cache.terracota.url, CzechIdM will automatically start with the clustered configuration. Otherwise, the on-heap config will be used
  • You can disable caching altogether by setting property spring.cache.type=none

If you want to use previously defined cache, you need to do two things:

  • Obtain IdmCacheManager instance (usually by Autowiring)
  • Call its methods cacheValue() and getValue()

Here is a little example:

@Service
public class MyService {

         @Autowired
         IdMcacheManager cacheManager;

...

        private void setCachedValue(String key, String value) {
		cacheManager.cacheValue(CACHE_NAME, key, value);
	}

...

        private Optional<Object> getCachedValue(String key) {
		return cacheManager.getValue(CACHE_NAME, key);
	}
	
}

If, for some reason, you need to use a cache provider other then EhCache, don't worry, you can. You just need to provide IdM with the necessary configuration. We support jCache standard JSR107, so using any compatible cache provider can be done for example this way:

@Configuration
@Order(0)
public class MyCacheCOnfig {

	@Bean
	@Qualifier("jCacheManager")
	@Primary
	CacheManager myCacheManager(@Autowired List<IdMCacheConfiguration> idMCacheConfigurations) {
		...
                Set up cache manager, create caches using idMCacheConfigurations list and return
                ...
	}

If you need to cache provider which is not compatible with JSR107, then you would need to provide your own implementation of IdMCacheManager.

Having only one Terracotta server in your deployment creates a single point of failure. CzechIdM requires some caches to be shared among all instances, otherwise features like provisioning brake, or turning processors on and of would not work. Because of these requirements, it is a good idea to have a failover server, which IdM can connect to in case of failure of the primary server.

Here is required configuration for Terracotta:

<?xml version="1.0" encoding="UTF-8"?>
<tc-config xmlns="http://www.terracotta.org/config"
           xmlns:ohr="http://www.terracotta.org/config/offheap-resource"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
           xsi:schemaLocation="http://www.terracotta.org/schema/terracotta-4.xsd">
  <!--
    This is the default Terracotta server configuration file for the Ehcache kit.

    It defines a single offheap resource of 512MB to be used for caching data.

    It also defines a single server, but you can add another one to benefit from high availability.
  -->

  <plugins>
    <config>
      <ohr:offheap-resources>
        <ohr:resource name="main" unit="MB">512</ohr:resource>
      </ohr:offheap-resources>

    </config>
  </plugins>

  <servers>
    <server host="localhost" name="default-server">
      <logs>%H/terracotta-logs</logs>
    <tsa-port>9410</tsa-port>
    </server>

    <server host="localhost" name="other-server">
    <logs>%H/terracotta-logs2</logs>
    <tsa-port>9420</tsa-port>
    </server>

    <!--
      Indicates how much time a server taking over after a failure in an active will wait
      to allow existing clients to reconnect. The time unit is seconds.
    -->
    <client-reconnect-window>120</client-reconnect-window>

  </servers>
  <failover-priority>
    <availability/>
  </failover-priority>

</tc-config>

You then start each server using these commands

./start-tc-server.sh -n default-server
./start-tc-server.sh -n other-server

You also need to provide the address of failover server to CzechIdM

idm.sec.cache.terracota.url=localhost:9410,localhost:9420