Galera Arbitrator as an odd node in Galera cluster


Galera Arbitrator is a poor man cluster node of your Galera cluster. In collective unconscious a cluster is a two nodes cluster. And to be honest where I work this is true in almost all the cases I have seen.

If we build an operating system cluster with Pacemaker or Veritas Cluster Server (VCS) we almost always setup two servers part of the cluster. Even if our Real Application Clusters (RAC) implementation we have setup a two nodes cluster, by the way is there people doing RAC on more than two nodes ?

As we have seen in a previous article (link) Galera Cluster is a free synchronous multi-master database cluster developed by Codership (

Truth is that Codership has always recommended a minimum of three nodes and more generally an odd number of nodes to manage split-brain situation. For one project colleagues have started their architecture with a two nodes cluster and saw this Galera Arbitrator that can play the odd node in their implementation. So I have decided to give it a try…

For my testing I have used MariaDB Server 10.6.12 and my cluster is made of three virtual machine running under VirtualBox and using Oracle Linux 8.7 (Oracle Linux Server release 8.7):

  • ( First Galera Cluster node
  • ( Second Galera Cluster node
  • ( The Galera Arbitrator node acting as a Galera Cluster node

As clearly stated in official documentation the Galera Arbitrator node does not store any data but see all replication traffic so it cannot be put in a low network or too far from your real Galera cluster nodes.

Galera Arbitrator installation

Since MariaDB release 10.1 there is no more a dedicated download of a complete MariaDB Galera Server. So I downloaded the traditional community edition of MariaDB 10.6.12 to get a file called mariadb-10.6.12-linux-systemd-x86_64.tar.gz.

As we have already seen I use MariaDB Optimal Configuration Architecture (MOCA) as a standard to install MariaDB:

DirectoryUsed for
/mariadb/data01/mariadb01Strore MyISAM and InnoDB files, dataxx directories can also be created to spread I/O
/mariadb/dump/mariadb01All log files (slow log, error log, general log, …)
/mariadb/logs/mariadb01All binary logs (log-bin, relay_log)
/mariadb/software/mariadb01MariaDB binaries (the my.cnf file is then stored in a conf subdirectory, as well as socket and pid files)

Once I had the binaires installed I wanted to check the help of Galera Arbitrator daemon executable:

[mariadb@server1 ~]$ /mariadb/software/mariadb01/bin/garbd --help
/mariadb/software/mariadb01/bin/garbd: error while loading shared libraries: cannot open shared object file: No such file or directory
[mariadb@server1 ~]$ ldd /mariadb/software/mariadb01/bin/garbd =>  (0x00007ffe78355000) => not found => /lib64/ (0x00007f1303092000) => /lib64/ (0x00007f1302e8a000) => not found => not found => /lib64/ (0x00007f1302b83000) => /lib64/ (0x00007f1302881000) => /lib64/ (0x00007f130266b000) => /lib64/ (0x00007f130229d000)
        /lib64/ (0x00007f1303684000)

Nowadays it is just not possible to rely on SSL library 1.0.0 as the library had multiple threats and has been updated many times. The Boost library (providing is also on release 1.66 on my recently updated server:

[root@server1 ~]# ll /lib64/libboost_program_options*
-rwxr-xr-x 1 root root 534584 Oct  2 15:24 /lib64/

It is also not better for the Galera Cluster library itself:

[mariadb@server1 ~]$ ll /mariadb/software/mariadb01/lib/
-rw-r----- 1 mariadb dba 39483176 Jan 27 06:54 /mariadb/software/mariadb01/lib/
[mariadb@server1 ~]$ ldd /mariadb/software/mariadb01/lib/
ldd: warning: you do not have execution permission for `/mariadb/software/mariadb01/lib/' =>  (0x00007ffd8772f000) => /lib64/ (0x00007fcb3fd7d000) => /lib64/ (0x00007fcb3fb75000) => not found => not found => /lib64/ (0x00007fcb3f86e000) => /lib64/ (0x00007fcb3f56c000) => /lib64/ (0x00007fcb3f356000) => /lib64/ (0x00007fcb3ef88000)
        /lib64/ (0x00007fcb403c1000)

Then I realized there is an additional Galera download on the MariaDB web site that I initially did not see (red arrow below):


Choosing the bintar edition I had multiple choices:


Testing all the x86_64 version the one I was not able to find a satisfactory one. galera-26.4.14-x86_64.tar.gz is having garbd with below dependencies:

[mariadb@server1 bin]$ ldd garbd (0x00007fff9f996000) => not found => /lib64/ (0x00007f0ce467c000) => /lib64/ (0x00007f0ce4474000) => not found => not found => /lib64/ (0x00007f0ce40df000) => /lib64/ (0x00007f0ce3d5d000) => /lib64/ (0x00007f0ce3b44000) => /lib64/ (0x00007f0ce377e000)
        /lib64/ (0x00007f0ce489c000)

With Boost release 1.66 there is not even a library even installing boost-devel. In fact even in installing all the boost related packages…

With galera-26.4.14-systemd-x86_64:

[mariadb@server1 bin]$ ldd garbd (0x00007ffd765e0000) => not found => /lib64/ (0x00007f6ef24e2000) => /lib64/ (0x00007f6ef22da000) => not found => not found => /lib64/ (0x00007f6ef1f45000) => /lib64/ (0x00007f6ef1bc3000) => /lib64/ (0x00007f6ef19aa000) => /lib64/ (0x00007f6ef15e4000)
        /lib64/ (0x00007f6ef2ad8000)

With galera-26.4.14-glibc_214-x86_64:

[mariadb@server1 bin]$ ldd garbd (0x00007ffdc7c79000) => not found => /lib64/ (0x00007f8bfee4d000) => /lib64/ (0x00007f8bfec45000) => not found => not found => /lib64/ (0x00007f8bfe8b0000) => /lib64/ (0x00007f8bfe52e000) => /lib64/ (0x00007f8bfe315000) => /lib64/ (0x00007f8bfdf4f000)
        /lib64/ (0x00007f8bff443000)

Then I have finally taken the RHEL 8 RPM version of Galera library (galera-4-26.4.14-1.el8.x86_64.rpm) that can be taken from and extracted it with:

[mariadb@server1 galera-4-26.4.14-1.el8.x86_64]$ rpm2cpio /tmp/galera-4-26.4.14-1.el8.x86_64.rpm | cpio -idmv
127636 blocks

The library links look better:

[mariadb@server1 galera-4-26.4.14-1.el8.x86_64]$ ldd usr/bin/garbd (0x00007fffd05db000) => /lib64/ (0x00007f73194c1000) => /lib64/ (0x00007f73192a1000) => /lib64/ (0x00007f7319099000) => /lib64/ (0x00007f7318e05000) => /lib64/ (0x00007f731891c000) => /lib64/ (0x00007f7318587000) => /lib64/ (0x00007f7318205000) => /lib64/ (0x00007f7317fec000) => /lib64/ (0x00007f7317c26000)
        /lib64/ (0x00007f7319743000) => /lib64/ (0x00007f7317a0e000) => /lib64/ (0x00007f731780a000)
[mariadb@server1 galera-4-26.4.14-1.el8.x86_64]$ ldd ./usr/lib64/galera-4/ (0x00007fff783bf000) => /lib64/ (0x00007f2df61f0000) => /lib64/ (0x00007f2df5fe8000) => /lib64/ (0x00007f2df5d54000) => /lib64/ (0x00007f2df586b000) => /lib64/ (0x00007f2df54d6000) => /lib64/ (0x00007f2df5154000) => /lib64/ (0x00007f2df4f3b000) => /lib64/ (0x00007f2df4b75000)
        /lib64/ (0x00007f2df6868000) => /lib64/ (0x00007f2df495d000) => /lib64/ (0x00007f2df4759000)

I did the dirty copy (any comment of the method is much appreciated) in my target bintar directory:

[mariadb@server1 galera-4-26.4.14-1.el8.x86_64]$ pwd
[mariadb@server1 galera-4-26.4.14-1.el8.x86_64]$ cp usr/bin/garbd /mariadb/software/mariadb01/bin/
[mariadb@server1 galera-4-26.4.14-1.el8.x86_64]$ cp ./usr/lib64/galera-4/ /mariadb/software/mariadb01/lib

And I can now issue a Galera Arbitrator command:

[mariadb@server1 ~]$ garbd --help
Usage: garbd [options] [group address]
  -d [ --daemon ]       Become daemon
  -n [ --name ] arg     Node name
  -a [ --address ] arg  Group address
  -g [ --group ] arg    Group name
  --sst arg             SST request string
  --donor arg           SST donor name
  -o [ --options ] arg  GCS/GCOMM option list
  -l [ --log ] arg      Log file
  -w [ --workdir ] arg  Daemon working directory
  -c [ --cfg ] arg      Configuration file
Other options:
  -v [ --version ]      Print version & exit
  -h [ --help ]         Show help message & exit

Galera Cluster configuration

On all nodes my mariadb Linux account has below in its profile:

export PATH=$PATH:.:/mariadb/software/mariadb01/bin:$HOME/.local/bin:$HOME/bin
alias mariadb01='/mariadb/software/mariadb01/bin/mariadb --defaults-file=/mariadb/software/mariadb01/conf/my.cnf --user=mariadb'
alias start_mariadb01='cd /mariadb/software/mariadb01/; ./bin/mariadbd-safe --defaults-file=/mariadb/software/mariadb01/conf/my.cnf &'
alias stop_mariadb01='/mariadb/software/mariadb01/bin/mariadb-admin --defaults-file=/mariadb/software/mariadb01/conf/my.cnf --user=mariadb shutdown'

Small script to clean your previous trial if you want to restart from scratch:

[mariadb@server2 ~]$ cd /mariadb/
[mariadb@server2 mariadb]$ for file in data01 dump logs; do rm -rf ./$file/mariadb01/*; done

Create a traditional MariaDB instance on your nodes (server1 and server2 for me) with (command should be issued by root for PAM authentication configuration):

[root@server1 ~]# cd /mariadb/software/mariadb01/
[root@server1 mariadb01]# ./scripts/mariadb-install-db --defaults-file=/mariadb/software/mariadb01/conf/my.cnf --user=mariadb

The typical MariaDB my.cnf file I have used is:

# Primary variables
basedir                         = /mariadb/software/mariadb01
datadir                         = /mariadb/data01/mariadb01
max_allowed_packet              = 256M
max_connect_errors              = 1000000
pid_file                        = /mariadb/software/mariadb01/conf/
log_bin                         = /mariadb/logs/mariadb01/mariadb-bin
log_bin_trust_function_creators = on
# Logging
log_error                       = /mariadb/dump/mariadb01/mariadb01.err
log_queries_not_using_indexes   = ON
long_query_time                 = 5
slow_query_log                  = ON     # Disabled for production
slow_query_log_file             = /mariadb/dump/mariadb01/mariadb01-slow.log
tmpdir                          = /tmp
user                            = mariadb
# InnoDB Settings
default_storage_engine          = InnoDB
innodb_buffer_pool_size         = 1G    # Use up to 70-80% of RAM
innodb_file_per_table           = ON
innodb_flush_method             = O_DIRECT
innodb_log_buffer_size          = 16M
innodb_log_file_size            = 512M
innodb_stats_on_metadata        = ON
innodb_read_io_threads          = 64
innodb_write_io_threads         = 64
# Query cache
query_cache_size = 10M
query_cache_type = ON
port                            = 3316
socket                          = /mariadb/software/mariadb01/conf/mariadb01.sock

When trying to connect to your instances you might get this error (that we saw already):

[mariadb@server2 ~]$ /mariadb/software/mariadb01/bin/mariadb --defaults-file=/mariadb/software/mariadb01/conf/my.cnf --user=mariadb
/mariadb/software/mariadb01/bin/mariadb: error while loading shared libraries: cannot open shared object file: No such file or directory

Solve it with:

[root@server1 ~]# dnf -y install ncurses-compat-libs.x86_64

To initiate a first replication I have created the below database and test table:

MariaDB [(none)]> create database galera character set utf8mb4 collate utf8mb4_general_ci;
Query OK, 1 row affected (0.001 sec)
MariaDB [(none)]> use galera;
Database changed
MariaDB [galera]> create table test01(id int primary key, descr varchar(50));
Query OK, 0 rows affected (0.007 sec)
MariaDB [galera]> insert into test01 values(1,'One');
Query OK, 1 row affected (0.002 sec)

Once you have created a MariaDB instance on all nodes (except for me that will run only the Galera Arbitrator) add on first instance the below Galera required parameter (wsrep_on is not mentioned in official documentation but highly important):

# Galera
wsrep_on                        = ON
binlog_format                   = ROW
default_storage_engine          = InnoDB
innodb_autoinc_lock_mode        = 2
innodb_flush_log_at_trx_commit  = 0
wsrep_provider                  = /mariadb/software/mariadb01/lib/
wsrep_provider_options          = "gcache.size=300M; gcache.page_size=300M"
wsrep_cluster_name              = "mycluster01"
wsrep_sst_method                = rsync # Default
#wsrep_cluster_address           = "gcomm://,,"
wsrep_cluster_address           = "gcomm://"
wsrep_node_address              = ""
wsrep_node_name                 = ""

The limitation on query cache has gone since MariaDB 10.1.2 so you can now keep it activated ! The commented wsrep_cluster_address value is the final value that we target.

Stop all MariaDB instances and start the first one and specify the wsrep-new-cluster parameter to initialize the Galera cluster:

[mariadb@server1 ~]$ alias start_mariadb01
alias start_mariadb01='cd /mariadb/software/mariadb01/; ./bin/mariadbd-safe --defaults-file=/mariadb/software/mariadb01/conf/my.cnf &'
[mariadb@server1 ~]$ cd /mariadb/software/mariadb01/
[mariadb@server1 mariadb01]$ ./bin/mariadbd-safe --defaults-file=/mariadb/software/mariadb01/conf/my.cnf --wsrep-new-cluster &
[1] 63554
[mariadb@server1 mariadb01]$ 230209 15:56:21 mysqld_safe Logging to '/mariadb/dump/mariadb01/mariadb01.err'.
230209 15:56:21 mysqld_safe Starting mariadbd daemon with databases from /mariadb/data01/mariadb01

Confirm the Galera cluster has been initialized with:

MariaDB [(none)]> show status like 'wsrep_cluster_size';
| Variable_name      | Value |
| wsrep_cluster_size | 1     |
1 row in set (0.002 sec)
MariaDB [(none)]> show status like 'wsrep_local_state_comment';
| Variable_name             | Value  |
| wsrep_local_state_comment | Synced |
1 row in set (0.002 sec)
MariaDB [(none)]> show status like 'wsrep_local_index';
| Variable_name     | Value |
| wsrep_local_index | 0     |
1 row in set (0.002 sec)

To request the second node to join the cluster you must not specify the wsrep-new-cluster so I have just added the Galera related initialization parameter in my.cnf (updated for the second node name and IP address) and change the wsrep_cluster_address to now match your extended cluster:

wsrep_cluster_address           = "gcomm://,,"

On first node you can set globally the same value and do not forget to update the my.cnf to match the new for future restart:

MariaDB [(none)]> set global wsrep_cluster_address="gcomm://,";
Query OK, 0 rows affected (0.000 sec)

We can confirm the Galera cluster has evolved in size:

MariaDB [galera]> show status like 'wsrep_cluster_size';
| Variable_name      | Value |
| wsrep_cluster_size | 2     |
1 row in set (0.001 sec)
MariaDB [galera]> show status like 'wsrep_local_state_comment';
| Variable_name             | Value  |
| wsrep_local_state_comment | Synced |
1 row in set (0.002 sec)

Last but not least my test01 table is there too:

MariaDB [galera]> select * from test01;
| id | descr |
|  1 | One   |
1 row in set (0.000 sec)

Galera Arbitrator configuration

Probably the easiest part of this blog post. As suggested I have created a arbitrator.config file:

[mariadb@server3 ~]$ cat arbitrator.config
# arbitrator.config
group   = mycluster01
address = gcomm://,,

Then start the Galera Arbitrator with:

[mariadb@server3 ~]$ garbd --cfg ~/arbitrator.config
2023-02-09 21:39:09.333  INFO: CRC-32C: using 64-bit x86 acceleration.
2023-02-09 21:39:09.333  INFO: Read config:
        daemon:  0
        name:    garb
        address: gcomm://,,
        group:   mycluster01
        sst:     trivial
        options: gcs.fc_limit=9999999; gcs.fc_factor=1.0; gcs.fc_single_primary=yes
        cfg:     /home/mariadb/arbitrator.config

You should also update the two MariaDB instances with (and make it persistent by modifying my.cnf file):

MariaDB [(none)]> set global wsrep_cluster_address="gcomm://,,";
Query OK, 0 rows affected (0.000 sec)

And voilà !!:

MariaDB [galera]> show status like 'wsrep_cluster_size';
| Variable_name      | Value |
| wsrep_cluster_size | 3     |
1 row in set (0.003 sec)
MariaDB [galera]> show status like 'wsrep_local_state_comment';
| Variable_name             | Value  |
| wsrep_local_state_comment | Synced |
1 row in set (0.001 sec)

Galera Cluster testing

I did not test that far last time I experimented Galera cluster but how to simulate a split brain between my two MariaDB instances. In other word how to make them not being able to communicate each other.

I have started with something like (so on server2 that I have try to make unreachable from server1):

[root@server2 ~]# iptables -A INPUT -s -j DROP; iptables -A OUTPUT -d -j DROP
[root@server2 ~]# iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
DROP       all  --  server1              anywhere
Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
DROP       all  --  anywhere             server1

When playing with iptables ensure you have a console access to the server (direct login in VirtualBox window) as it is relatively easy to mess up and cut your own ssh access…

But I was still able to insert row in the instance running on server1 and it was correctly replicated in instance running on server2. I have scratch my head for long and realized that the Galera Arbitrator running on server3 was transmitting the replication information…

Means the Galera cluster flow was still able to transit from server1 to server2 through server3 and Galera cluster had nothing wrong to say about it (robust you said ?) ! At the time I have cut the flow between server2 and server3 with:

[root@server2 ~]# iptables -A INPUT -s -j DROP; iptables -A OUTPUT -d -j DROP

Server2 got evicted as confirmed by Galera Arbitrator log:

2023-02-13 12:36:40.364  INFO: (83714f67-8014, 'tcp://') connection to peer 9655244d-ad7d with addr tcp:// timed out, no messages seen in PT3S, socket stats: rtt: 1734 rttvar: 2388 rto: 1616000 lost: 1 last_data_recv: 3061 cwnd: 1 last_queued_since: 54002275 last_delivered_since: 3060774291 send_queue_length: 0 send_queue_bytes: 0 segment: 0 messages: 0
2023-02-13 12:36:40.365  INFO: (83714f67-8014, 'tcp://') turning message relay requesting on, nonlive peers: tcp://
2023-02-13 12:36:43.416  INFO: deleting entry tcp://
2023-02-13 12:36:43.419  INFO: Quorum results:
        version    = 6,
        component  = PRIMARY,
        conf_id    = 8,
        members    = 2/2 (joined/total),
        act_id     = 36,
        last_appl. = 27,
        protocols  = 2/10/4 (gcs/repl/appl),
        vote policy= 0,
        group UUID = ef64f5c5-a8b4-11ed-bec7-56724cf6f3c0

This can also be seen from show status like ‘wsrep%’ command on server1:

| wsrep_incoming_addresses      | ,                         |
| wsrep_cluster_size            | 2                                            |

You can clean iptables with:

[root@server2 ~]# iptables --flush

The node is automatically reinserted into Galera Cluster:

2023-02-13 14:38:37.519  INFO: Quorum results:
        version    = 6,
        component  = PRIMARY,
        conf_id    = 11,
        members    = 2/3 (joined/total),
        act_id     = 39,
        last_appl. = 27,
        protocols  = 2/10/4 (gcs/repl/appl),
        vote policy= 0,
        group UUID = ef64f5c5-a8b4-11ed-bec7-56724cf6f3c0
2023-02-13 14:38:37.519  INFO: Flow-control interval: [9999999, 9999999]
2023-02-13 14:38:37.526  WARN: Protocol violation. JOIN message sender 1.0 ( is not in state transfer (SYNCED). Message ignored.
2023-02-13 14:38:37.532  WARN: Rejecting JOIN message from 2.0 ( new State Transfer required.
2023-02-13 14:38:37.534  WARN: SYNC message from non-JOINED 2.0 (, PRIMARY). Ignored.
2023-02-13 14:38:39.241  INFO: (83714f67-8014, 'tcp://') turning message relay requesting off

All node Galera nodes you can now see:

| wsrep_incoming_addresses      | ,,     |
| wsrep_cluster_size            | 3                                            |

Then behind the scene the instance on server2 has not been shut down (an framework or monitoring tool could do it). You can still connect to it and perform queries but when you try to insert a row you get:

MariaDB [(none)]> insert into galera.test01 values(11,'Eleven');
ERROR 1047 (08S01): WSREP has not yet prepared node for application use

While server2 was out of the Galera Cluster I have inserted a row into my test table:

MariaDB [(none)]> insert into galera.test01 values(10,'Ten');
Query OK, 1 row affected (0.002 sec)

This row got replicated when server2 re-join my Galera Cluster…

Without Galera Arbitrator if you cut the link between your two nods then the cluster stop immediately and you cannot insert rows anymore…


About Post Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>