Tuesday, December 29, 2020

Storing ThingsBoard time series data with Apache Cassandra: improved efficiency

Great news for all ThingsBoard users who store time series data in the Cassandra database!

We are happy to announce a long-awaited improvement to the Cassandra time series DAO that caches the storage of partitions for telemetry keys and ultimately speeds up the storage of time series data by up to two times. 


Let's take a look at this improvement.


So, what have we done?

Prior to ThingsBoard version 3.2.1, saving a single time series data point caused two insert requests to Cassandra. One of the actual value and one for the partition. Saving partitions is necessary to optimize some aggregation queries. This is how ThingsBoard “knows” that there is some data for a specific device for a particular timeframe. 


In other words, while saving for instance 25 different data points for a certain entity such as a device, we performed 50 requests to the database:

25 requests to save time series data records (which consist of UUID, telemetry key, telemetry value, timestamp) and 25 requests to save partition records (UUID, telemetry key, and a partition timestamp calculated from the timestamp in an appropriate request).

Assuming all 25 data points belong to the same partition (e.g same month if TS_KV_PARTITIONING parameter is set to MONTHS), it appears that we inserted 25 duplicate partition records.

This definitely required an improvement. 


Starting with ThingsBoard 3.2.1, we introduce partition cache to avoid saving duplicate partition records. 


Therefore, the following parameter TS_KV_PARTITIONS_MAX_CACHE_SIZE has been added with a default value of 100000, which is responsible for the maximum number of partition keys that can be cached.


How to configure it?


In order to optimize the cache hit rate, you should specify the correct value of the TS_KV_PARTITIONS_MAX_CACHE_SIZE parameter.

The partition cache consumes approximately 300 bytes (depends on the average size of the data point key) for each cache record. Caching 1 million partitions will require approximately 300 MB of RAM.

Let's take a look at the specific example for a better understanding of what value of this parameter should be set depending on your case:

Suppose we use TS_KV_PARTITIONING = MONTHS.

Let's assume that we have 100,000 devices of the same type, each of which sends 2 data points to the system for storage every second. So, 200,000 requests to save time series data records to the database every second and, as a result, another 200,000 requests to save section records to the database every second, which means that without using the cache in previous versions of ThingsBoard we would do ~ 400,000 requests/sec.
Based on this result, the default value TS_KV_PARTITIONS_MAX_CACHE_SIZE = 100000 for partition caching wasn’t enough for all partitions to be saved and cached. With a default value, only 100,000 partitions would be cached and the cache hit rate would be close to 0.5. Half of the values would be pushed out of the cache each time requests are executed to save time series data records.
That is, by changing the default value to TS_KV_PARTITIONS_MAX_CACHE_SIZE = 100000 (devices) * 2 (unique data point keys) + 10000 (reserve for saving statistics, etc.), we will avoid storing the duplicated partition records. 
Therefore, the size of the cache should be directly proportional to (number of data points * number of entities that will send these data points) + reserve.

The partition cache consumes approximately 300 bytes (depends on the average size of the data point key) for each cache record. Caching 1 million partitions will require approximately 300 MB of RAM.