Getting started with Tungsten Replicator and MySQL replication

Preamble

To quote the official project page, Tungsten Replicator is:

A high performance, open source, data replication engine for MySQL

But not only as, even if there are contradictory documentation on the subject on Internet it allows replication from:

  • MySQL
  • Oracle
  • PostgreSQL

To:

  • MongoDB
  • MySQL
  • Oracle
  • PostgreSQL
  • Vertica

The tool is written in Java and Ruby and sponsored by Continuent (they obviously provide commercial support if you need).

I use MariaDB 5.5.35 64 bits running on Oracle Linux Server release 6.5 and Tungsten Replicator 2.2.0 (build 292). If successful I’m planning to test it wit Oracle as, to be honest, the real first thing that came to my mind was a free replacement of expensive Oracle GoldenGate.

In below my master server is called server1.domain.com and has 192.168.56.101 non-routable IP address and my slave is called server2.domain.com and has 192.168.56.102 non-routable IP address.

Installation

As usual I’m using my MOCA directory layout:

Directory Used for
/mysql/data01/mysql01 Strore MyISAM and InnoDB files, dataxx directories can also be created to spread I/O
/mysql/dump/mysql01 All log files (slow log, error log, general log, …)
/mysql/logs/mysql01 All binary logs (log-bin, relay_log)
/mysql/software/mysql01 MySQL binaries (the my.cnf file is then stored in a conf subdirectory, as well as socket and pid files)

And use the below standard my.cnf file for all instance (server-id different on each node), I’m not coming back on MySQL installation that should be piece of cake if you read this article:

[mysqld]
user=mysql
basedir=/mysql/software/mysql01
datadir=/mysql/data01/mysql01
port=3316
pid_file=/mysql/software/mysql01/conf/mysql01.pid
log_error=/mysql/dump/mysql01/mysql01.err
default-storage-engine=innodb
tmpdir=/tmp
binlog_format=row
server-id=1
log_bin=/mysql/logs/mysql01/mysql-bin.log
socket=/mysql/software/mysql01/conf/mysql01.sock
sql_mode='TRADITIONAL,PIPES_AS_CONCAT'
slow_query_log_file=/mysql/dump/mysql01/mysql01-slow.log
slow_query_log=1
long_query_time=5
max_allowed_packet=100M
default-storage-engine=innodb

innodb_buffer_pool_size=50M
innodb_flush_log_at_trx_commit=2
innodb_log_buffer_size=12M
innodb_log_file_size=50M
innodb_log_files_in_group = 4
innodb_thread_concurrency=0
innodb_rollback_on_timeout=1
innodb_lock_wait_timeout=600
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT

[mysql]
default-character-set=utf8
socket=/mysql/software/mysql01/conf/mysql01.sock

[client]
default-character-set=utf8
socket=/mysql/software/mysql01/conf/mysql01.sock

First the prerequisites, that are quite easy to resolve if you are using a standard Linux distribution (yum repository available):

  • which
  • rsync (which rsync)
  • Ruby 1.8.5 or later (ruby –version )
  • Ruby openssl libraries ( echo “p ‘OK'” | ruby -ropenssl )
  • Java 1.6 or later (java -version)
  • GNU tar (tar –version)

I have chosen to use tungsten as OS account that I put in dba group (same as mysql OS account) to allow access to binary logs. I have unzip the Tungsten Replicator binary distribution in /tmp/tungsten-replicator-2.1.2-44 directory and plan to install the tool in /tungsten-replicator directory.

As requested I can connect with ssh from any node to any node (including self connection) with tungsten account without specifying a password and without the warning asking you to add node in known host list (small ssh configuration needed).

I started to use the provided scripts in cookbook directory (install_master_slave for example) but either it was not working (release 2.2.0) or it was not allowing the flexibility I needed for my small test case. My test case is not that complex but I wanted to replicate my master MySQL database (replicationdb with InnoDB storage engine) simulating users performing replication on it at same time.

With Tungsten Replicator 2.2.0 build 292 I got below error when using install_master_slave script:

ERROR >> server2_domain_com >> At least one replication service has experienced an error
And on server2.domain.com
INFO   | jvm 1    | 2014/02/17 12:58:32 | WARNING: Could not execute query org.drizzle.jdbc.internal.common.query.DrizzleQuery@304e5770: Variable 'pseudo_thread_id' can't be set to the value of '-1'
INFO   | jvm 1    | 2014/02/17 12:58:32 | 2014-02-17 12:58:32,686 [cookbook - q-to-dbms-0] ERROR applier.JdbcApplier Statement failed: set @@session.pseudo_thread_id=-1
INFO   | jvm 1    | 2014/02/17 12:58:32 | 2014-02-17 12:58:32,688 [cookbook - q-to-dbms-0] ERROR pipeline.SingleThreadStageTask Event application failed: seqno=0 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master
INFO   | jvm 1    | 2014/02/17 12:58:32 | com.continuent.tungsten.replicator.applier.ApplierException: java.sql.SQLException: Statement failed on slave but succeeded on master
.
.
.

Which can be simulated with something like:

MariaDB [(none)]> show variables like 'pseudo_thread_id';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| pseudo_thread_id | 1     |
+------------------+-------+
1 row in set (0.00 sec)
MariaDB [(none)]> set session pseudo_thread_id=-1;
ERROR 1231 (42000): Variable 'pseudo_thread_id' can't be set to the value of '-1'

Release 2.1.2 build 44 is working well if you set USE_TPM=1 to avoid install_master_slave script to fail for again strange error (Even if I do have mysql in PATH of my tungsten installation account):

ERROR >> server2.domain.com >> Failed: which mysql, RC: 256, Result: which: no mysql in (/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin)

But again the install_master_slave script does not allow you to start replication from a fixed position…

Even if when installing Tungsten Replicator I have not used cookbook script there are some that are worth using (validate_cluster, show_cluster, …) and to use them you must modify two important files (COMMON_NODES.sh and USER_VALUES.sh).

My COMMON_NODES.sh file:

[root@server1 cookbook]# cat COMMON_NODES.sh
#!/bin/bash
# (C) Copyright 2012,2013 Continuent, Inc - Released under the New BSD License
# Version 1.0.5 - 2013-04-03
 
export NODE1=server1.domain.com
export NODE2=server2.domain.com

My USER_VALUES.sh file:

#!/bin/bash
# (C) Copyright 2012,2013 Continuent, Inc - Released under the New BSD License
# Version 1.0.5 - 2013-04-03
 
# User defined values for the cluster to be installed.
 
cookbook_dir=$(dirname $0 )
 
# Where to install Tungsten Replicator
export TUNGSTEN_BASE=/tungsten-replicator
 
# Directory containing the database binary logs
export BINLOG_DIRECTORY=/mysql/logs/mysql01
 
# Path to the script that can start, stop, and restart a MySQL server
export MYSQL_BOOT_SCRIPT=/tmp/mysql01
 
# Path to the options file
export MY_CNF=/mysql/software/mysql01/conf/my.cnf
 
# Database credentials
export DATABASE_USER=yjaquier
export DATABASE_PASSWORD=secure_password
export DATABASE_PORT=3316

I encountered the below error which was causing the validate_cluster script to run forever:

.The authenticity of host 'server2.domain.com (192.168.56.102)' can't be established.
RSA key fingerprint is 94:e0:7d:2a:14:df:34:ad:c9:60:2b:a5:24:6b:0c:7a.
Are you sure you want to continue connecting (yes/no)?

Because I used the lazy approach to fill my known_hosts file and used the short server name, i.e. server2 instead of server2.domain.com. To avoid this error connect to all server including self-connection from all servers to all servers with FQDN server names.

Another error with mysql not in path, added mysql binary file in tungsten account path:

sh: mysql: command not found
sh: mysql: command not found

Saw also that validation script is looking for Percona XtraBackup so installed it in MySQL binary directory:

which: no innobackupex-1.5.1 in (/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/tungsten/bin)
which: no innobackupex-1.5.1 in (/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/tungsten/bin)

Remark:
If you want a more verbose output you can set VERBOSE environment variable to 2.

Final errors I go was:

ERROR >> server1_domain_com >> The database user is missing some privileges or the grant option. Run 'mysql -uyjaquier -psecure_password -hserver1.domain.com -e"GRANT ALL ON *.* to yjaquier@server1 WITH GRANT OPTION"' (MySQLPermissionsCheck)

Account I’m using has all privileges but not with grant option…So corrected with:

MariaDB [(none)]> GRANT ALL ON *.* to yjaquier@'%' with grant option;
Query OK, 0 rows affected (0.00 sec)

So I have running transactions on my master and I want to load my slave from a master backup like I did when testing GTID feature of MySQL 5.6:

MariaDB [(none)]> CREATE DATABASE replicationdb CHARACTER SET utf8 COLLATE utf8_general_ci;
Query OK, 1 row affected (0.00 sec)
 
MariaDB [(none)]> CREATE TABLE replicationdb.test1(id int, descr varchar(50));
Query OK, 0 rows affected (0.06 sec)
 
MariaDB [replicationdb]> alter table replicationdb.test1 add primary key (id);
Query OK, 0 rows affected (0.07 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
MariaDB [(none)]> insert into replicationdb.test1 values(1,'One');
Query OK, 1 row affected (0.01 sec)
 
MariaDB [(none)]> insert into replicationdb.test1 values(2,'Two');
Query OK, 1 row affected (0.00 sec)

Then I perform a backup (using –master-data=2 to have binary log name and position in backup):

[mysql@server1 tmp]$ /mysql/software/mysql01/bin/mysqldump --defaults-file=/mysql/software/mysql01/conf/my.cnf --user=root --password=`cat ~mysql/.root_password` \
                     --single-transaction --databases replicationdb --master-data=2 > backup.sql
[mysql@server1 tmp]$ ll backup.sql
-rw-r--r-- 1 mysql dba 2175 Feb 18 14:08 backup.sql
[mysql@server1 tmp]$ grep -i "change master" backup.sql
-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=2763;

To simulate activity I insert two more rows in my test table, that I’m expecting to be replicated to slave server once Tungsten replication has been started:

MariaDB [(none)]> insert into replicationdb.test1 values(3,'Three');
Query OK, 1 row affected (0.00 sec)
 
MariaDB [(none)]> insert into replicationdb.test1 values(4,'Four');
Query OK, 1 row affected (0.00 sec)

The backup is loaded on slave server obviously…

Then install Tungsten Replicator with:

[tungsten@server1 tungsten-replicator-2.2.0-292]$ pwd
/tmp/tungsten-replicator-2.2.0-292
[tungsten@server1 tungsten-replicator-2.2.0-292]$ ./tools/tpm install flow1 \
        --reset \
        --master=server1.domain.com \
        --log=timestamp \
        --replication-user=yjaquier \
        --replication-password=secure_password \
        --replication-port=3316 \
        --home-directory=/tungsten-replicator \
        --slaves=server2.domain.com \
        --datasource-mysql-conf=/mysql/software/mysql01/conf/my.cnf \
        --datasource-log-directory=/mysql/logs/mysql01 \
        --datasource-boot-script=/tmp/mysql01 \
        --repl-disable-relay-logs=true \
        --rmi-port=10000 \
        --thl-port=2112 \
        --topology=master-slave

Start service on server1.domain.com (done automatically on server2 starting with Tunsgten Replicator 2.2.0) from binary log and position found in backup file:

[tungsten@server1 tungsten-replicator-2.2.0-292]$ ./tools/tpm start flow1 --from-event 000002:2763
.......
Getting replication status on server2.domain.com
Processing services command...
NAME              VALUE
----              -----
appliedLastSeqno: 6
appliedLatency  : 1.355
role            : slave
serviceName     : flow1
serviceType     : local
started         : true
state           : ONLINE
Finished services command...
Getting replication status on server1.domain.com
Processing services command...
NAME              VALUE
----              -----
appliedLastSeqno: 6
appliedLatency  : 1.499
role            : master
serviceName     : flow1
serviceType     : local
started         : true
state           : ONLINE
Finished services command...
 
NOTE  >> Command successfully completed

You can verify all is working fine with:

[tungsten@server1 tungsten]$ /tungsten-replicator/tungsten/tungsten-replicator/bin/trepctl -host server1.domain.com services
Processing services command...
NAME              VALUE
----              -----
appliedLastSeqno: 7
appliedLatency  : 0.18
role            : master
serviceName     : flow1
serviceType     : local
started         : true
state           : ONLINE
Finished services command...
[tungsten@server1 tungsten]$ /tungsten-replicator/tungsten/tungsten-replicator/bin/trepctl -host server2.domain.com services
Processing services command...
NAME              VALUE
----              -----
appliedLastSeqno: 7
appliedLatency  : 0.0
role            : slave
serviceName     : flow1
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

Rows inserted after the backup on server1.domain.com (id equals to 3 and 4) are well transferred to server2.domain.com and new inserted rows are correctly replicated…

Testing

You can check your cluster status and topology with show_cluster script:

[tungsten@server1 tungsten]$ ./cookbook/show_cluster
This command requires an installed cluster
[tungsten@server1 tungsten]$ pwd
/tungsten-replicator/tungsten
[tungsten@server1 tungsten]$ echo master_slave > CURRENT_TOPOLOGY
[tungsten@server1 tungsten]$ ./cookbook/show_cluster
--------------------------------------------------------------------------------------
Topology: 'MASTER_SLAVE'
--------------------------------------------------------------------------------------
# node server1.domain.com
flow1  [master] seqno:         10  - latency:   1.057 - ONLINE
# node server2.domain.com
flow1  [slave]  seqno:         10  - latency:   0.625 - ONLINE

Continuent is proposing a script to set all environment variables for you:

[tungsten@server2 tungsten-replicator]$ . /tungsten-replicator/share/env.sh

To simulate an apply error I first insert a row on slave (this demonstrate why your slave should be in read only mode):

MariaDB [(none)]> insert into replicationdb.test1 values(6,'Six');
Query OK, 1 row affected (0.00 sec)

Row that I then insert on master plus an additional one that will not be replicated because replication is going to fail as I have a primary key on id column:

MariaDB [(none)]> insert into replicationdb.test1 values(6,'Six');
Query OK, 1 row affected (0.00 sec)
 
MariaDB [(none)]> insert into replicationdb.test1 values(7,'Seven');
Query OK, 1 row affected (0.02 sec)

As expected:

[tungsten@server1 tungsten]$ ./cookbook/show_cluster
--------------------------------------------------------------------------------------
Topology: 'MASTER_SLAVE'
--------------------------------------------------------------------------------------
# node server1.domain.com
flow1  [master] seqno:          8  - latency:   0.891 - ONLINE
# node server2.domain.com
flow1  [slave]  seqno:         -1  - latency:  -1.000 - OFFLINE:ERROR
 
[tungsten@server1 tungsten]$ trepctl -host server2 services -full
Processing services command...
NAME                     VALUE
----                     -----
appliedLastEventId     : NONE
appliedLastSeqno       : -1
appliedLatency         : -1.0
channels               : -1
clusterName            : flow1
currentEventId         : NONE
currentTimeMillis      : 1392816326784
dataServerHost         : server2.domain.com
extensions             :
host                   : server2.domain.com
latestEpochNumber      : -1
masterConnectUri       : thl://server1.domain.com:2112/
masterListenUri        : thl://server2.domain.com:2112/
maximumStoredSeqNo     : -1
minimumStoredSeqNo     : -1
offlineRequests        : NONE
pendingError           : Event application failed: seqno=8 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master
pendingErrorCode       : NONE
pendingErrorEventId    : mysql-bin.000003:0000000000000951;176
pendingErrorSeqno      : 8
pendingExceptionMessage: java.sql.SQLException: Statement failed on slave but succeeded on master
                         insert into replicationdb.test1 values(6,'Six')
pipelineSource         : UNKNOWN
relativeLatency        : -1.0
resourcePrecedence     : 99
rmiPort                : 10000
role                   : slave
seqnoType              : java.lang.Long
serviceName            : flow1
serviceType            : unknown
simpleServiceName      : flow1
siteName               : default
sourceId               : server2.domain.com
started                : true
state                  : OFFLINE:ERROR
timeInStateSeconds     : 64.565
transitioningTo        :
uptimeSeconds          : 7887.022
useSSLConnection       : false
version                : Tungsten Replicator 2.2.0 build 292
Finished services command...

We see above that replication is stuck on slave on sequence 8 (pendingErrorSeqno : 8) that is inserting the already existing id equals to 6. Easy situation to solve:

[tungsten@server1 tungsten]$ trepctl -host server2 -service flow1 online -skip-seqno 8
[tungsten@server1 tungsten]$ trepctl -host server2 services
Processing services command...
NAME              VALUE
----              -----
appliedLastSeqno: 9
appliedLatency  : 224.992
role            : slave
serviceName     : flow1
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

And row with id equals to 7 is correctly replicated… Of course it will not always be so easy but the background and commands are there to handle them…

Cleaning

You can do it manually (stop Tungsten, clean directories and drop Tungsten schema) or use the provided script:

[tungsten@server1 tungsten]$ pwd
/tungsten-replicator/tungsten
[tungsten@server1 tungsten]$ ./cookbook/clear_cluster
--------------------------------------------------------------------------------------
Clearing up cluster with installed topology 'MASTER_SLAVE'
--------------------------------------------------------------------------------------
!!! WARNING !!!
--------------------------------------------------------------------------------------
'clear-cluster' is a potentially damaging operation.
This command will do all the following:
* Stop the replication software in all servers. [$STOP_REPLICATORS]
* REMOVE ALL THE CONTENTS from /tungsten-replicator/.[$REMOVE_TUNGSTEN_BASE]
* REMOVE the tungsten_<service_name> schemas in all nodes (server1.domain.com server2.domain.com) [$REMOVE_SERVICE_SCHEMA]
* REMOVE the schemas created for testing (test, evaluator) in all nodes (server1.domain.com server2.domain.com)  [$REMOVE_TEST_SCHEMAS]
* Create the test server anew;                [$CLEAN_NODE_DATABASE_SERVER]
* Unset the read_only variable;               [$CLEAN_NODE_DATABASE_SERVER]
* Set the binlog format to MIXED;             [$CLEAN_NODE_DATABASE_SERVER]
* Reset the master (removes all binary logs); [$CLEAN_NODE_DATABASE_SERVER]
If this is what you want, either set the variable I_WANT_TO_UNINSTALL
or answer 'y' to the question below
You may also set the variables in brackets to fine tune the execution.
Alternatively, have a look at ./cookbook/clear_cluster and customize it to your needs.
--------------------------------------------------------------------------------------
Do you wish to uninstall this cluster? [y/n] y
cat: /tungsten-replicator/tungsten/resultsserver1.domain.com: No such file or directory
./cookbook/clear_cluster: line 100: /tungsten-replicator/tungsten/clear_node_sql_server1.domain.com.sh: No such file or directory
rm: cannot remove `/tungsten-replicator/tungsten/resultsserver1.domain.com': No such file or directory
rm: cannot remove `/tungsten-replicator/tungsten/clear_node_server1.domain.com.sh': No such file or directory
rm: cannot remove `/tungsten-replicator/tungsten/clear_node_sql_server1.domain.com.sh': No such file or directory
cat: /tungsten-replicator/tungsten/resultsserver2.domain.com: No such file or directory
./cookbook/clear_cluster: line 100: /tungsten-replicator/tungsten/clear_node_sql_server2.domain.com.sh: No such file or directory
rm: cannot remove `/tungsten-replicator/tungsten/resultsserver2.domain.com': No such file or directory
rm: cannot remove `/tungsten-replicator/tungsten/clear_node_server2.domain.com.sh': No such file or directory
rm: cannot remove `/tungsten-replicator/tungsten/clear_node_sql_server2.domain.com.sh': No such file or directory

Remark:
Even if specified the tungsten_flow1 schema has not been dropped and I had to do it manually.

References

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>