Citus as a distributed database plugin hands-on

Preamble

Citus is an excellent columnar storage that fits perfectly in a Business intelligence instance as we have seen in a previous post (link). Citus also provide a distributed database feature inside PostgreSQL.

Citus is not a new software nor a fork of PostgreSQL but insert itself smoothly as a plugin in a traditional PostgreSQL cluster. So you still have all the power and the full ecosystem of PostgreSQL.

Citus is a sharding plugin allowing you to shard your big objects (fact table objects in a star schema implementation) on multiple separated PostgreSQL instance (called workers). The queries will be then distributed on all the worker nodes directly from the coordinator node (that does not contains any data) where you clients will connect:

Remark:
The dimension table can be duplicated entirely on all worker nodes and fact tables can be co-located by shards to speed up join operations.

My testing has been done with three Virtual Machines under VirtualBox running Oracle Linux Server release 8.7 and PostgreSQL 15.2:

server1.domain.com (192.168.56.101): Citus coordinator node
server2.domain.com (192.168.56.102): First Citus worker node
server3.domain.com (192.168.56.103): Second Citus worker node

As we will see the solution is great to improve performance and to scale out the big queries of a BI platform. But in now way it is an High Available solution like MariaDB Galera Cluster as with Citus you have to handle yourself the loss of any node of your cluster. Even the worker nodes are not redundant as they all contains an unique shard of your sharded objects… The generic recommendation from Citus is to use PostgreSQL replication to remove of the Single Point Of Failure (SPOF).

Citus distributed database installation

[root@server1 ~]# lvcreate --name lvol01 --size 5g vg00
  Logical volume "lvol01" created.
[root@server1 ~]# mkfs -t xfs /dev/vg00/lvol02
[root@server1 ~]# grep postgres /etc/fstab
/dev/mapper/vg00-lvol02   /postgres                   xfs     defaults        0 0
[root@server1 ~]# mount -a
[root@server1 ~]# useradd --gid dba -m --shell /bin/bash postgres

export PATH=/usr/pgsql-15/bin/:$PATH:$HOME/.local/bin:$HOME/bin
export PGDATA=/postgres/15/data
alias psql='psql --port=5433 --host=server1.domain.com --username=postgres'
alias pg_stop='pg_ctl -l logfile stop'
alias pg_start='pg_ctl -l logfile start'
alias pg_restart='pg_ctl -l logfile restart'

[postgres@server1 ~]$ initdb --pgdata=/postgres/15/data

In pg_hba.conf I have set. I want to be able to connect on all node on my local network with postgres account without password. There is most probably a better secure way to do it but this is not the purpose of this blog post:

host    all             postgres             192.168.56.0/24            trust

[postgres@server1 ~]$ grep -E -v '^(\t|\s)*#.*|^$' /postgres/15/data/postgresql.conf
listen_addresses = 'server1.domain.com'         # what IP address(es) to listen on;
port = 5433                             # (change requires restart)
max_connections = 100                   # (change requires restart)
shared_buffers = 128MB                  # min 128kB
dynamic_shared_memory_type = posix      # the default is usually the first option
max_wal_size = 1GB
min_wal_size = 80MB
log_destination = 'stderr'              # Valid values are combinations of
logging_collector = on                  # Enable capturing of stderr, jsonlog,
log_directory = 'log'                   # directory where log files are written,
log_filename = 'postgresql-%a.log'      # log file name pattern,
log_rotation_age = 1d                   # Automatic rotation of logfiles will
log_rotation_size = 0                   # Automatic rotation of logfiles will
log_truncate_on_rotation = on           # If on, an existing log file with the
log_line_prefix = '%m [%p] '            # special values:
log_timezone = 'CET'
datestyle = 'iso, mdy'
timezone = 'CET'
lc_messages = 'en_US.UTF-8'                     # locale for system error message
lc_monetary = 'en_US.UTF-8'                     # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'                      # locale for number formatting
lc_time = 'en_US.UTF-8'                         # locale for time formatting
default_text_search_config = 'pg_catalog.english'

[root@server2 ~]# curl https://install.citusdata.com/community/rpm.sh | sudo bash

[root@server1 ~]# dnf -y install citus112_15

I have added to my postgresql.conf file (postgresql15-contrib.x86_64 package required for pg_stat_statements):

shared_preload_libraries = 'citus,pg_stat_statements'   # (change requires restart)

I also create an account for myself to be able to use my HeiSQL graphical tool:

postgres=# create role yjaquier with superuser login password 'secure_password';
CREATE ROLE

Do not forget to modify the access file:

host    all             yjaquier             0.0.0.0/0            scram-sha-256

I wanted to create all my Citus objects in a dedicated database, so created:

postgres=# create database citus encoding=UTF8 locale='en_US.UTF-8';
NOTICE:  Citus partially supports CREATE DATABASE for distributed databases
DETAIL:  Citus does not propagate CREATE DATABASE command to workers
HINT:  You can manually create a database and its extensions on workers.
CREATE DATABASE

Please also check the GitHub issue in references section, the extension is database related so you must create it in each database you plan to distribute with Citus:

postgres=# \c citus
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
You are now connected to database "citus" as user "postgres".
citus=# create extension if not exists citus;
CREATE EXTENSION
postgres=# select * from pg_extension;
  oid  |    extname     | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
-------+----------------+----------+--------------+----------------+------------+-----------+--------------
 13529 | plpgsql        |       10 |           11 | f              | 1.0        |           |
 16387 | citus_columnar |       10 |           11 | f              | 11.2-1     |           |
 16443 | citus          |       10 |           11 | f              | 11.2-1     |           |
(3 rows)

Citus distributed database configuration

On coordinator node (again connect first to your chosen Citus database):

citus=# SELECT citus_set_coordinator_host('server1.domain.com', 5433);
 citus_set_coordinator_host
----------------------------
 
(1 row)

postgres=# SELECT * from citus_add_node(‘server2.domain.com’, 5433);
ERROR: connection to the remote node server2.domain.com:5433 failed with the following error: server does not support SSL, but SSL was required

https://docs.citusdata.com/en/stable/reference/common_errors.html#ssl-error-certificate-verify-failed

As of Citus 8.1 SSL is required by default between the Citus cluster nodes:

citus=# show citus.node_conninfo;
 citus.node_conninfo
---------------------
 sslmode=require
(1 row)

By the way I noticed that the Citus set coordinator command created everything needed on my corrdinator node (server1):

[postgres@server1 ~]$ ll /postgres/15/data/ser*
-rw------- 1 postgres dba  981 Feb 17 16:53 /postgres/15/data/server.crt
-rw------- 1 postgres dba 1704 Feb 17 16:53 /postgres/15/data/server.key
[postgres@server1 ~]$ cat /postgres/15/data/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
ssl = 'on'
ssl_ciphers = 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA384'

So I had to set it up on the other nodes (I did it on all nodes to start with a clean situation):

[postgres@server1 ~]$ openssl req -new -x509 -days 365 -nodes -text -out server1.crt -keyout server1.key -subj "/CN=server1.domain.com"
Generating a RSA private key
.....................................................................................................+++++
................................................+++++
writing new private key to 'server1.key'
-----
[postgres@server1 ~]$ openssl req -new -nodes -text -out root.csr -keyout root.key -subj "/CN=server1.domain.com"
Generating a RSA private key
...........................................................................+++++
.........................................................................................................................................+++++
writing new private key to 'root.key'
-----
[postgres@server1 ~]$ openssl x509 -req -in root.csr -text -days 3650 -extfile /etc/pki/tls/openssl.cnf -extensions v3_ca -signkey root.key -out root.crt
Signature ok
subject=CN = server1.domain.com
Getting Private key
[postgres@server1 ~]$ openssl req -new -nodes -text -out server1.csr -keyout server1.key -subj "/CN=server1.domain.com"
Generating a RSA private key
....................................................................................................................................................+++++
.................+++++
writing new private key to 'server1.key'
-----
[postgres@server1 ~]$ openssl x509 -req -in server1.csr -text -days 365 -CA root.crt -CAkey root.key -CAcreateserial -out server1.crt
Signature ok
subject=CN = server1.domain.com
Getting CA Private Key
[postgres@server1 ~]$ chmod 600 *
[postgres@server1 ~]$ ll
total 28
-rw------- 1 postgres dba 2881 Feb 20 11:38 root.crt
-rw------- 1 postgres dba 3343 Feb 20 11:37 root.csr
-rw------- 1 postgres dba 1708 Feb 20 11:37 root.key
-rw------- 1 postgres dba   41 Feb 20 11:38 root.srl
-rw------- 1 postgres dba 2759 Feb 20 11:38 server1.crt
-rw------- 1 postgres dba 3343 Feb 20 11:38 server1.csr
-rw------- 1 postgres dba 1704 Feb 20 11:38 server1.key

[postgres@server1 ~]$ openssl req -new -x509 -days 365 -nodes -text -out server1.crt -keyout server1.key -subj "/CN=server1.domain.com" Generating a RSA private key .....................................................................................................+++++ ................................................+++++ writing new private key to 'server1.key' ----- [postgres@server1 ~]$ openssl req -new -nodes -text -out root.csr -keyout root.key -subj "/CN=server1.domain.com" Generating a RSA private key ...........................................................................+++++ .........................................................................................................................................+++++ writing new private key to 'root.key' ----- [postgres@server1 ~]$ openssl x509 -req -in root.csr -text -days 3650 -extfile /etc/pki/tls/openssl.cnf -extensions v3_ca -signkey root.key -out root.crt Signature ok subject=CN = server1.domain.com Getting Private key [postgres@server1 ~]$ openssl req -new -nodes -text -out server1.csr -keyout server1.key -subj "/CN=server1.domain.com" Generating a RSA private key ....................................................................................................................................................+++++ .................+++++ writing new private key to 'server1.key' ----- [postgres@server1 ~]$ openssl x509 -req -in server1.csr -text -days 365 -CA root.crt -CAkey root.key -CAcreateserial -out server1.crt Signature ok subject=CN = server1.domain.com Getting CA Private Key [postgres@server1 ~]$ chmod 600 * [postgres@server1 ~]$ ll total 28 -rw------- 1 postgres dba 2881 Feb 20 11:38 root.crt -rw------- 1 postgres dba 3343 Feb 20 11:37 root.csr -rw------- 1 postgres dba 1708 Feb 20 11:37 root.key -rw------- 1 postgres dba 41 Feb 20 11:38 root.srl -rw------- 1 postgres dba 2759 Feb 20 11:38 server1.crt -rw------- 1 postgres dba 3343 Feb 20 11:38 server1.csr -rw------- 1 postgres dba 1704 Feb 20 11:38 server1.key

You can also use below cert script (execute with ./cert server3):

openssl req -new -x509 -days 365 -nodes -text -out $1.crt -keyout $1.key -subj "/CN=$1.domain.com"
openssl req -new -nodes -text -out root.csr -keyout root.key -subj "/CN=$1.domain.com"
openssl x509 -req -in root.csr -text -days 3650 -extfile /etc/pki/tls/openssl.cnf -extensions v3_ca -signkey root.key -out root.crt
openssl req -new -nodes -text -out $1.csr -keyout $1.key -subj "/CN=$1.domain.com"
openssl x509 -req -in $1.csr -text -days 365 -CA root.crt -CAkey root.key -CAcreateserial -out $1.crt
chmod 600 $1* root*

Once SSL is activated you see it appearing when connecting:

[postgres@server3 ~]$ psql
psql (15.2)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.
 
postgres=#

Or:

citus=# SELECT run_command_on_workers('show ssl');
  run_command_on_workers
--------------------------------
(server2.domain.com,5433,t,on)
(server3.domain.com,5433,t,on)
(2 rows)

Nodes can now be added without issues:

citus=# SELECT * from citus_add_node('server2.domain.com', 5433);
 citus_add_node
----------------
              5
(1 row)
 
citus=# SELECT * from citus_add_node('server3.domain.com', 5433);
 citus_add_node
----------------
              6
(1 row)
 
citus=# SELECT * FROM citus_get_active_worker_nodes();
     node_name      | node_port
--------------------+-----------
 server3.domain.com |      5433
 server2.domain.com |      5433
(2 rows)

Citus distributed database testing

I’m using the same example table as the GitHub repository:

citus=# CREATE TABLE events (
citus(#   device_id bigint,
citus(#   event_id bigserial,
citus(#   event_time timestamptz default now(),
citus(#   data jsonb not null,
citus(#   PRIMARY KEY (device_id, event_id)
citus(# );
CREATE TABLE
citus=# SELECT create_distributed_table('events', 'device_id');
 create_distributed_table
--------------------------
 
(1 row)
 
citus=# select * from citus_tables;
 table_name | citus_table_type | distribution_column | colocation_id | table_size | shard_count | table_owner | access_method
------------+------------------+---------------------+---------------+------------+-------------+-------------+---------------
 events     | distributed      | device_id           |             1 | 512 kB     |          32 | postgres    | heap

Filled it with 10 million rows:

citus=# INSERT INTO events (device_id, data) SELECT s % 100, ('{"measurement":'||random()||'}')::jsonb FROM generate_series(1,10000000) s;
INSERT 0 1000000
citus=# SELECT table_name, table_size FROM citus_tables;
 table_name | table_size
------------+------------
 events     | 1213 MB
(1 row)
citus=# SELECT * FROM events WHERE device_id = 1 ORDER BY event_time DESC, event_id DESC LIMIT 3;
 device_id | event_id |          event_time           |                 data
-----------+----------+-------------------------------+--------------------------------------
         1 |   999901 | 2023-02-20 15:39:35.181012+01 | {"measurement": 0.8813668984787719}
         1 |   999801 | 2023-02-20 15:39:35.181012+01 | {"measurement": 0.03485957466660938}
         1 |   999701 | 2023-02-20 15:39:35.181012+01 | {"measurement": 0.7250541721835948}
(3 rows)

You can now display the explain plan of the sharded queries…

Selection on a device_id so only one shard accessed:

citus=# EXPLAIN (VERBOSE ON) SELECT * FROM events WHERE device_id = 1 ORDER BY event_time DESC, event_id DESC LIMIT 3;
                                                                                       QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Custom Scan (Citus Adaptive)  (cost=0.00..0.00 rows=0 width=0)
   Output: remote_scan.device_id, remote_scan.event_id, remote_scan.event_time, remote_scan.data
   Task Count: 1
   Tasks Shown: All
   ->  Task
         Query: SELECT device_id, event_id, event_time, data FROM public.events_102009 events WHERE (device_id OPERATOR(pg_catalog.=) 1) ORDER BY event_time DESC, event_id DESC LIMIT 3
         Node: host=server3.domain.com port=5433 dbname=citus
         ->  Limit  (cost=1082.91..1082.92 rows=3 width=63)
               Output: device_id, event_id, event_time, data
               ->  Sort  (cost=1082.91..1108.28 rows=10147 width=63)
                     Output: device_id, event_id, event_time, data
                     Sort Key: events.event_time DESC, events.event_id DESC
                     ->  Bitmap Heap Scan on public.events_102009 events  (cost=242.93..951.77 rows=10147 width=63)
                           Output: device_id, event_id, event_time, data
                           Recheck Cond: (events.device_id = 1)
                           ->  Bitmap Index Scan on events_pkey_102009  (cost=0.00..240.39 rows=10147 width=0)
                                 Index Cond: (events.device_id = 1)
             Query Identifier: 1025693804818833478
 Query Identifier: 5031995992548174482
(19 rows)

More generic query using all shards (32 is the default number of shards: citus.shard_count). only one task on one shard is displayed by default:

citus=# EXPLAIN (VERBOSE ON) SELECT device_id, count(*) from events group by device_id;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Custom Scan (Citus Adaptive)  (cost=0.00..0.00 rows=100000 width=16)
   Output: remote_scan.device_id, remote_scan.count
   Task Count: 32
   Tasks Shown: One of 32
   ->  Task
         Query: SELECT device_id, count(*) AS count FROM public.events_102008 events WHERE true GROUP BY device_id
         Node: host=server2.domain.com port=5433 dbname=citus
         ->  HashAggregate  (cost=800.00..800.03 rows=3 width=16)
               Output: device_id, count(*)
               Group Key: events.device_id
               ->  Seq Scan on public.events_102008 events  (cost=0.00..650.00 rows=30000 width=8)
                     Output: device_id, event_id, event_time, data
             Query Identifier: -3583515104173105020
 Query Identifier: 399593402794421440
(14 rows)

To display all tasks:

citus=# SET citus.explain_all_tasks = 1;
SET
citus=# EXPLAIN SELECT device_id, count(*) from events group by device_id;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Custom Scan (Citus Adaptive)  (cost=0.00..0.00 rows=100000 width=16)
   Task Count: 32
   Tasks Shown: All
   ->  Task
         Node: host=server2.domain.com port=5433 dbname=citus
         ->  HashAggregate  (cost=800.00..800.03 rows=3 width=16)
               Group Key: device_id
               ->  Seq Scan on events_102008 events  (cost=0.00..650.00 rows=30000 width=8)
   ->  Task
         Node: host=server3.domain.com port=5433 dbname=citus
         ->  HashAggregate  (cost=1332.00..1332.05 rows=5 width=16)
               Group Key: device_id
               ->  Seq Scan on events_102009 events  (cost=0.00..1082.00 rows=50000 width=8)
.
.

To compare with a non sharded table I have create the same exact table witouth sharding it:

citus=# CREATE TABLE events2 (
citus(#   device_id bigint,
citus(#   event_id bigserial,
citus(#   event_time timestamptz default now(),
citus(#   data jsonb not null,
citus(#   PRIMARY KEY (device_id, event_id)
citus(# );
CREATE TABLE
citus=# INSERT INTO events2 (device_id, data) SELECT s % 100, ('{"measurement":'||random()||'}')::jsonb FROM generate_series(1,10000000) s;
INSERT 0 10000000

Result of below query over around 10 runs, this is not a benchmark (use \timing with psql to have execution time) !!:

Query	Events2	Events
SELECT device_id, COUNT() FROM table_name* GROUP BY device_id;	5 seconds	2.9 seconds

So yes even on my small test cluster made of VMs there is a difference…

Citus distributed database cluster management

Handle node failure

If you loose one worker you can still perform queries but you get the expected bellow error:

citus=# SELECT device_id, count(*) from events GROUP BY device_id;
ERROR:  connection to the remote node server2.domain.com:5433 failed with the following error: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

If you loose the coordinator this is even worst as you cannot perform any queries.

Here clearly no miracle and the High Availability must be handle by you with, for example, PostgreSQL replication as suggested in Citus documentation. We are really far from a High Available solution…

Removing nodes

I have started with:

citus=# select citus_remove_node('server3.domain.com', 5433);
ERROR:  cannot remove or disable the node server3.domain.com:5433 because because it contains the only shard placement for shard 102009
DETAIL:  One of the table(s) that prevents the operation complete successfully is public.events
HINT:  To proceed, either drop the tables or use undistribute_table() function to convert them to local tables

Completely un-distributing tables might often not be a solution particularly in case of very big objects. Even more if you already have plenty of workers. You can move shards on this node one by one using citus_move_shard_placement or better:

citus=# select * from citus_drain_node('server3.domain.com', 5433);
ERROR:  connection to the remote node localhost:5433 failed with the following error: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 5433 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?

As I got an error I tried with (but same error):

citus=# select * from citus_set_node_property('server3.domain.com', 5433, 'shouldhaveshards', false);
 citus_set_node_property
-------------------------
 
(1 row)
 
citus=# select nodename, shouldhaveshards from pg_dist_node;
      nodename      | shouldhaveshards
--------------------+------------------
 server1.domain.com | f
 server2.domain.com | t
 server3.domain.com | f
(3 rows)
 
citus=# select * from citus_rebalance_start(drain_only := true);
ERROR:  connection to the remote node localhost:5433 failed with the following error: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 5433 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?

Then luckily found that sometimes Citus needs to connect to themselves and citus.local_hostname parameter is key in those situations:

postgres=# alter system set citus.local_hostname to 'server2.domain.com';
ALTER SYSTEM
postgres=# select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)

Then everything worked better:

citus=# select * from citus_drain_node('server3.domain.com', 5433);
NOTICE:  Moving shard 102009 from server3.domain.com:5433 to server2.domain.com:5433 ...
ERROR:  ERROR:  logical decoding requires wal_level >= logical
CONTEXT:  while executing command on server3.domain.com:5433
while executing command on server1.domain.com:5433
citus=# show wal_level
citus-# ;
 wal_level
-----------
 replica
(1 row)

Changer wal_level to logical on all instances…

Finally:

citus=# select * from citus_drain_node('server3.domain.com', 5433);
NOTICE:  Moving shard 102009 from server3.domain.com:5433 to server2.domain.com:5433 ...
NOTICE:  Moving shard 102011 from server3.domain.com:5433 to server2.domain.com:5433 ...
NOTICE:  Moving shard 102013 from server3.domain.com:5433 to server2.domain.com:5433 ...
NOTICE:  Moving shard 102015 from server3.domain.com:5433 to server2.domain.com:5433 ...
NOTICE:  Moving shard 102017 from server3.domain.com:5433 to server2.domain.com:5433 ...
NOTICE:  Moving shard 102019 from server3.domain.com:5433 to server2.domain.com:5433 ...
NOTICE:  Moving shard 102021 from server3.domain.com:5433 to server2.domain.com:5433 ...
NOTICE:  Moving shard 102023 from server3.domain.com:5433 to server2.domain.com:5433 ...
NOTICE:  Moving shard 102025 from server3.domain.com:5433 to server2.domain.com:5433 ...
NOTICE:  Moving shard 102027 from server3.domain.com:5433 to server2.domain.com:5433 ...
NOTICE:  Moving shard 102029 from server3.domain.com:5433 to server2.domain.com:5433 ...
NOTICE:  Moving shard 102031 from server3.domain.com:5433 to server2.domain.com:5433 ...
NOTICE:  Moving shard 102033 from server3.domain.com:5433 to server2.domain.com:5433 ...
NOTICE:  Moving shard 102035 from server3.domain.com:5433 to server2.domain.com:5433 ...
NOTICE:  Moving shard 102037 from server3.domain.com:5433 to server2.domain.com:5433 ...
NOTICE:  Moving shard 102039 from server3.domain.com:5433 to server2.domain.com:5433 ...
 citus_drain_node
------------------
 
(1 row)

The distributed table is now on one worker only:

citus=# select distinct nodename from citus_shards where table_name='events'::regclass;
      nodename
--------------------
 server2.domain.com
(1 row)
 
citus=# select citus_remove_node('server3.domain.com', 5433);
 citus_remove_node
-------------------
 
(1 row)
 
citus=# SELECT * FROM citus_get_active_worker_nodes();
     node_name      | node_port
--------------------+-----------
 server2.domain.com |      5433
(1 row)

At this point you can stop instance running on server3.domain.com with no impact.

Adding nodes

Let’s add back the server we removed in previous chapter:

citus=# SELECT * from citus_add_node('server3.domain.com', 5433);
 citus_add_node
----------------
              5
(1 row)
 
citus=# SELECT * FROM citus_get_active_worker_nodes();
     node_name      | node_port
--------------------+-----------
 server3.domain.com |      5433
 server2.domain.com |      5433
(2 rows)

We need to rebalance the events table:

citus=# select nodename,count(*) from citus_shards where table_name='events'::regclass group by nodename;
      nodename      | count
--------------------+-------
 server2.domain.com |    32
(1 row)
 
citus=# select * from get_rebalance_table_shards_plan();
 table_name | shardid | shard_size |     sourcename     | sourceport |     targetname     | targetport
------------+---------+------------+--------------------+------------+--------------------+------------
 events     |  102008 |          0 | server2.domain.com |       5433 | server3.domain.com |       5433
 events     |  102009 |          0 | server2.domain.com |       5433 | server3.domain.com |       5433
 events     |  102010 |          0 | server2.domain.com |       5433 | server3.domain.com |       5433
 events     |  102011 |          0 | server2.domain.com |       5433 | server3.domain.com |       5433
 events     |  102012 |          0 | server2.domain.com |       5433 | server3.domain.com |       5433
 events     |  102013 |          0 | server2.domain.com |       5433 | server3.domain.com |       5433
 events     |  102014 |          0 | server2.domain.com |       5433 | server3.domain.com |       5433
 events     |  102015 |          0 | server2.domain.com |       5433 | server3.domain.com |       5433
 events     |  102016 |          0 | server2.domain.com |       5433 | server3.domain.com |       5433
 events     |  102017 |          0 | server2.domain.com |       5433 | server3.domain.com |       5433
 events     |  102018 |          0 | server2.domain.com |       5433 | server3.domain.com |       5433
 events     |  102019 |          0 | server2.domain.com |       5433 | server3.domain.com |       5433
 events     |  102020 |          0 | server2.domain.com |       5433 | server3.domain.com |       5433
 events     |  102021 |          0 | server2.domain.com |       5433 | server3.domain.com |       5433
 events     |  102022 |          0 | server2.domain.com |       5433 | server3.domain.com |       5433
 events     |  102023 |          0 | server2.domain.com |       5433 | server3.domain.com |       5433
(16 rows)
 
citus=# SELECT citus_rebalance_start();
NOTICE:  Scheduled 16 moves as job 1
DETAIL:  Rebalance scheduled as background job
HINT:  To monitor progress, run: SELECT * FROM citus_rebalance_status();
 citus_rebalance_start
-----------------------
                     1
(1 row)
 
citus=# SELECT * FROM citus_rebalance_status();
 job_id |  state  | job_type  |           description           |          started_at           | finished_at |
                                                                                          details
 
--------+---------+-----------+---------------------------------+-------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------
      1 | running | rebalance | Rebalance all colocation groups | 2023-02-23 14:30:41.909994+01 |             | {"tasks": [{"LSN": {"lag": null, "source": "0/5B88E000", "target": null}, "size": {"source": "36 MB", "target": "8856 kB
"}, "hosts": {"source": "server2.domain.com:5433", "target": "server3.domain.com:5433"}, "phase": "Copying Data", "state": "running", "command": "SELECT pg_catalog.citus_move_shard_placement(102008,2,5,'auto')", "message": "", "retr
ied": 0, "task_id": 1}], "task_state_counts": {"blocked": 15, "running": 1}}
(1 row)
 
citus=# select nodename,count(*) from citus_shards where table_name='events'::regclass group by nodename;
      nodename      | count
--------------------+-------
 server2.domain.com |    22
 server3.domain.com |    10
(2 rows)
 
citus=# SELECT * FROM citus_rebalance_status();
 job_id |  state   | job_type  |           description           |          started_at           |          finished_at          |                     details
--------+----------+-----------+---------------------------------+-------------------------------+-------------------------------+--------------------------------------------------
      1 | finished | rebalance | Rebalance all colocation groups | 2023-02-23 14:30:41.909994+01 | 2023-02-23 14:32:47.406338+01 | {"tasks": [], "task_state_counts": {"done": 16}}
(1 row)

citus=# select nodename,count(*) from citus_shards where table_name='events'::regclass group by nodename; nodename | count --------------------+------- server2.domain.com | 32 (1 row)citus=# select * from get_rebalance_table_shards_plan(); table_name | shardid | shard_size | sourcename | sourceport | targetname | targetport ------------+---------+------------+--------------------+------------+--------------------+------------ events | 102008 | 0 | server2.domain.com | 5433 | server3.domain.com | 5433 events | 102009 | 0 | server2.domain.com | 5433 | server3.domain.com | 5433 events | 102010 | 0 | server2.domain.com | 5433 | server3.domain.com | 5433 events | 102011 | 0 | server2.domain.com | 5433 | server3.domain.com | 5433 events | 102012 | 0 | server2.domain.com | 5433 | server3.domain.com | 5433 events | 102013 | 0 | server2.domain.com | 5433 | server3.domain.com | 5433 events | 102014 | 0 | server2.domain.com | 5433 | server3.domain.com | 5433 events | 102015 | 0 | server2.domain.com | 5433 | server3.domain.com | 5433 events | 102016 | 0 | server2.domain.com | 5433 | server3.domain.com | 5433 events | 102017 | 0 | server2.domain.com | 5433 | server3.domain.com | 5433 events | 102018 | 0 | server2.domain.com | 5433 | server3.domain.com | 5433 events | 102019 | 0 | server2.domain.com | 5433 | server3.domain.com | 5433 events | 102020 | 0 | server2.domain.com | 5433 | server3.domain.com | 5433 events | 102021 | 0 | server2.domain.com | 5433 | server3.domain.com | 5433 events | 102022 | 0 | server2.domain.com | 5433 | server3.domain.com | 5433 events | 102023 | 0 | server2.domain.com | 5433 | server3.domain.com | 5433 (16 rows)citus=# SELECT citus_rebalance_start(); NOTICE: Scheduled 16 moves as job 1 DETAIL: Rebalance scheduled as background job HINT: To monitor progress, run: SELECT * FROM citus_rebalance_status(); citus_rebalance_start ----------------------- 1 (1 row)citus=# SELECT * FROM citus_rebalance_status(); job_id | state | job_type | description | started_at | finished_at | details--------+---------+-----------+---------------------------------+-------------------------------+-------------+------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------- 1 | running | rebalance | Rebalance all colocation groups | 2023-02-23 14:30:41.909994+01 | | {"tasks": [{"LSN": {"lag": null, "source": "0/5B88E000", "target": null}, "size": {"source": "36 MB", "target": "8856 kB "}, "hosts": {"source": "server2.domain.com:5433", "target": "server3.domain.com:5433"}, "phase": "Copying Data", "state": "running", "command": "SELECT pg_catalog.citus_move_shard_placement(102008,2,5,'auto')", "message": "", "retr ied": 0, "task_id": 1}], "task_state_counts": {"blocked": 15, "running": 1}} (1 row)citus=# select nodename,count(*) from citus_shards where table_name='events'::regclass group by nodename; nodename | count --------------------+------- server2.domain.com | 22 server3.domain.com | 10 (2 rows)citus=# SELECT * FROM citus_rebalance_status(); job_id | state | job_type | description | started_at | finished_at | details --------+----------+-----------+---------------------------------+-------------------------------+-------------------------------+-------------------------------------------------- 1 | finished | rebalance | Rebalance all colocation groups | 2023-02-23 14:30:41.909994+01 | 2023-02-23 14:32:47.406338+01 | {"tasks": [], "task_state_counts": {"done": 16}} (1 row)

The table is finally equilly balanced on all workers:

citus=# select nodename,count(*) from citus_shards where table_name='events'::regclass group by nodename;
      nodename      | count
--------------------+-------
 server2.domain.com |    16
 server3.domain.com |    16
(2 rows)

IT World

Citus as a distributed database plugin hands-on

Table of contents

Preamble

Citus distributed database installation

Citus distributed database configuration

Citus distributed database testing

Citus distributed database cluster management

Handle node failure

Removing nodes

Adding nodes

References

About Post Author

Yannick Jaquier

Related Posts:

Leave a Reply Cancel Reply