Table of contents
Preamble
To quote the official project page, Tungsten Replicator is:
A high performance, open source, data replication engine for MySQL
But not only as, even if there are contradictory documentation on the subject on Internet it allows replication from:
- MySQL
- Oracle
- PostgreSQL
To:
- MongoDB
- MySQL
- Oracle
- PostgreSQL
- Vertica
The tool is written in Java and Ruby and sponsored by Continuent (they obviously provide commercial support if you need).
I use MariaDB 5.5.35 64 bits running on Oracle Linux Server release 6.5 and Tungsten Replicator 2.2.0 (build 292). If successful I’m planning to test it wit Oracle as, to be honest, the real first thing that came to my mind was a free replacement of expensive Oracle GoldenGate.
In below my master server is called server1.domain.com and has 192.168.56.101 non-routable IP address and my slave is called server2.domain.com and has 192.168.56.102 non-routable IP address.
Installation
As usual I’m using my MOCA directory layout:
Directory | Used for |
---|---|
/mysql/data01/mysql01 | Strore MyISAM and InnoDB files, dataxx directories can also be created to spread I/O |
/mysql/dump/mysql01 | All log files (slow log, error log, general log, …) |
/mysql/logs/mysql01 | All binary logs (log-bin, relay_log) |
/mysql/software/mysql01 | MySQL binaries (the my.cnf file is then stored in a conf subdirectory, as well as socket and pid files) |
And use the below standard my.cnf file for all instance (server-id different on each node), I’m not coming back on MySQL installation that should be piece of cake if you read this article:
[mysqld] user=mysql basedir=/mysql/software/mysql01 datadir=/mysql/data01/mysql01 port=3316 pid_file=/mysql/software/mysql01/conf/mysql01.pid log_error=/mysql/dump/mysql01/mysql01.err default-storage-engine=innodb tmpdir=/tmp binlog_format=row server-id=1 log_bin=/mysql/logs/mysql01/mysql-bin.log socket=/mysql/software/mysql01/conf/mysql01.sock sql_mode='TRADITIONAL,PIPES_AS_CONCAT' slow_query_log_file=/mysql/dump/mysql01/mysql01-slow.log slow_query_log=1 long_query_time=5 max_allowed_packet=100M default-storage-engine=innodb innodb_buffer_pool_size=50M innodb_flush_log_at_trx_commit=2 innodb_log_buffer_size=12M innodb_log_file_size=50M innodb_log_files_in_group = 4 innodb_thread_concurrency=0 innodb_rollback_on_timeout=1 innodb_lock_wait_timeout=600 innodb_autoinc_lock_mode=2 innodb_flush_log_at_trx_commit=2 innodb_flush_method=O_DIRECT [mysql] default-character-set=utf8 socket=/mysql/software/mysql01/conf/mysql01.sock [client] default-character-set=utf8 socket=/mysql/software/mysql01/conf/mysql01.sock
First the prerequisites, that are quite easy to resolve if you are using a standard Linux distribution (yum repository available):
- which
- rsync (which rsync)
- Ruby 1.8.5 or later (ruby –version )
- Ruby openssl libraries ( echo “p ‘OK'” | ruby -ropenssl )
- Java 1.6 or later (java -version)
- GNU tar (tar –version)
I have chosen to use tungsten as OS account that I put in dba group (same as mysql OS account) to allow access to binary logs. I have unzip the Tungsten Replicator binary distribution in /tmp/tungsten-replicator-2.1.2-44 directory and plan to install the tool in /tungsten-replicator directory.
As requested I can connect with ssh from any node to any node (including self connection) with tungsten account without specifying a password and without the warning asking you to add node in known host list (small ssh configuration needed).
I started to use the provided scripts in cookbook directory (install_master_slave for example) but either it was not working (release 2.2.0) or it was not allowing the flexibility I needed for my small test case. My test case is not that complex but I wanted to replicate my master MySQL database (replicationdb with InnoDB storage engine) simulating users performing replication on it at same time.
With Tungsten Replicator 2.2.0 build 292 I got below error when using install_master_slave script:
ERROR >> server2_domain_com >> At least one replication service has experienced an error And on server2.domain.com INFO | jvm 1 | 2014/02/17 12:58:32 | WARNING: Could not execute query org.drizzle.jdbc.internal.common.query.DrizzleQuery@304e5770: Variable 'pseudo_thread_id' can't be set to the value of '-1' INFO | jvm 1 | 2014/02/17 12:58:32 | 2014-02-17 12:58:32,686 [cookbook - q-to-dbms-0] ERROR applier.JdbcApplier Statement failed: set @@session.pseudo_thread_id=-1 INFO | jvm 1 | 2014/02/17 12:58:32 | 2014-02-17 12:58:32,688 [cookbook - q-to-dbms-0] ERROR pipeline.SingleThreadStageTask Event application failed: seqno=0 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master INFO | jvm 1 | 2014/02/17 12:58:32 | com.continuent.tungsten.replicator.applier.ApplierException: java.sql.SQLException: Statement failed on slave but succeeded on master . . . |
Which can be simulated with something like:
MariaDB [(none)]> show variables like 'pseudo_thread_id'; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | pseudo_thread_id | 1 | +------------------+-------+ 1 row in set (0.00 sec) MariaDB [(none)]> set session pseudo_thread_id=-1; ERROR 1231 (42000): Variable 'pseudo_thread_id' can't be set to the value of '-1' |
Release 2.1.2 build 44 is working well if you set USE_TPM=1 to avoid install_master_slave script to fail for again strange error (Even if I do have mysql in PATH of my tungsten installation account):
ERROR >> server2.domain.com >> Failed: which mysql, RC: 256, Result: which: no mysql in (/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin) |
But again the install_master_slave script does not allow you to start replication from a fixed position…
Even if when installing Tungsten Replicator I have not used cookbook script there are some that are worth using (validate_cluster, show_cluster, …) and to use them you must modify two important files (COMMON_NODES.sh and USER_VALUES.sh).
My COMMON_NODES.sh file:
[root@server1 cookbook]# cat COMMON_NODES.sh #!/bin/bash # (C) Copyright 2012,2013 Continuent, Inc - Released under the New BSD License # Version 1.0.5 - 2013-04-03 export NODE1=server1.domain.com export NODE2=server2.domain.com |
My USER_VALUES.sh file:
#!/bin/bash # (C) Copyright 2012,2013 Continuent, Inc - Released under the New BSD License # Version 1.0.5 - 2013-04-03 # User defined values for the cluster to be installed. cookbook_dir=$(dirname $0 ) # Where to install Tungsten Replicator export TUNGSTEN_BASE=/tungsten-replicator # Directory containing the database binary logs export BINLOG_DIRECTORY=/mysql/logs/mysql01 # Path to the script that can start, stop, and restart a MySQL server export MYSQL_BOOT_SCRIPT=/tmp/mysql01 # Path to the options file export MY_CNF=/mysql/software/mysql01/conf/my.cnf # Database credentials export DATABASE_USER=yjaquier export DATABASE_PASSWORD=secure_password export DATABASE_PORT=3316 |
I encountered the below error which was causing the validate_cluster script to run forever:
.The authenticity of host 'server2.domain.com (192.168.56.102)' can't be established. RSA key fingerprint is 94:e0:7d:2a:14:df:34:ad:c9:60:2b:a5:24:6b:0c:7a. Are you sure you want to continue connecting (yes/no)?
Because I used the lazy approach to fill my known_hosts file and used the short server name, i.e. server2 instead of server2.domain.com. To avoid this error connect to all server including self-connection from all servers to all servers with FQDN server names.
Another error with mysql not in path, added mysql binary file in tungsten account path:
sh: mysql: command not found sh: mysql: command not found
Saw also that validation script is looking for Percona XtraBackup so installed it in MySQL binary directory:
which: no innobackupex-1.5.1 in (/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/tungsten/bin) which: no innobackupex-1.5.1 in (/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/tungsten/bin)
Remark:
If you want a more verbose output you can set VERBOSE environment variable to 2.
Final errors I go was:
ERROR >> server1_domain_com >> The database user is missing some privileges or the grant option. Run 'mysql -uyjaquier -psecure_password -hserver1.domain.com -e"GRANT ALL ON *.* to yjaquier@server1 WITH GRANT OPTION"' (MySQLPermissionsCheck)
Account I’m using has all privileges but not with grant option…So corrected with:
MariaDB [(none)]> GRANT ALL ON *.* to yjaquier@'%' with grant option; Query OK, 0 rows affected (0.00 sec) |
So I have running transactions on my master and I want to load my slave from a master backup like I did when testing GTID feature of MySQL 5.6:
MariaDB [(none)]> CREATE DATABASE replicationdb CHARACTER SET utf8 COLLATE utf8_general_ci; Query OK, 1 row affected (0.00 sec) MariaDB [(none)]> CREATE TABLE replicationdb.test1(id int, descr varchar(50)); Query OK, 0 rows affected (0.06 sec) MariaDB [replicationdb]> alter table replicationdb.test1 add primary key (id); Query OK, 0 rows affected (0.07 sec) Records: 0 Duplicates: 0 Warnings: 0 MariaDB [(none)]> insert into replicationdb.test1 values(1,'One'); Query OK, 1 row affected (0.01 sec) MariaDB [(none)]> insert into replicationdb.test1 values(2,'Two'); Query OK, 1 row affected (0.00 sec) |
Then I perform a backup (using –master-data=2 to have binary log name and position in backup):
[mysql@server1 tmp]$ /mysql/software/mysql01/bin/mysqldump --defaults-file=/mysql/software/mysql01/conf/my.cnf --user=root --password=`cat ~mysql/.root_password` \ --single-transaction --databases replicationdb --master-data=2 > backup.sql [mysql@server1 tmp]$ ll backup.sql -rw-r--r-- 1 mysql dba 2175 Feb 18 14:08 backup.sql [mysql@server1 tmp]$ grep -i "change master" backup.sql -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=2763; |
To simulate activity I insert two more rows in my test table, that I’m expecting to be replicated to slave server once Tungsten replication has been started:
MariaDB [(none)]> insert into replicationdb.test1 values(3,'Three'); Query OK, 1 row affected (0.00 sec) MariaDB [(none)]> insert into replicationdb.test1 values(4,'Four'); Query OK, 1 row affected (0.00 sec) |
The backup is loaded on slave server obviously…
Then install Tungsten Replicator with:
[tungsten@server1 tungsten-replicator-2.2.0-292]$ pwd /tmp/tungsten-replicator-2.2.0-292 [tungsten@server1 tungsten-replicator-2.2.0-292]$ ./tools/tpm install flow1 \ --reset \ --master=server1.domain.com \ --log=timestamp \ --replication-user=yjaquier \ --replication-password=secure_password \ --replication-port=3316 \ --home-directory=/tungsten-replicator \ --slaves=server2.domain.com \ --datasource-mysql-conf=/mysql/software/mysql01/conf/my.cnf \ --datasource-log-directory=/mysql/logs/mysql01 \ --datasource-boot-script=/tmp/mysql01 \ --repl-disable-relay-logs=true \ --rmi-port=10000 \ --thl-port=2112 \ --topology=master-slave |
Start service on server1.domain.com (done automatically on server2 starting with Tunsgten Replicator 2.2.0) from binary log and position found in backup file:
[tungsten@server1 tungsten-replicator-2.2.0-292]$ ./tools/tpm start flow1 --from-event 000002:2763 ....... Getting replication status on server2.domain.com Processing services command... NAME VALUE ---- ----- appliedLastSeqno: 6 appliedLatency : 1.355 role : slave serviceName : flow1 serviceType : local started : true state : ONLINE Finished services command... Getting replication status on server1.domain.com Processing services command... NAME VALUE ---- ----- appliedLastSeqno: 6 appliedLatency : 1.499 role : master serviceName : flow1 serviceType : local started : true state : ONLINE Finished services command... NOTE >> Command successfully completed |
You can verify all is working fine with:
[tungsten@server1 tungsten]$ /tungsten-replicator/tungsten/tungsten-replicator/bin/trepctl -host server1.domain.com services Processing services command... NAME VALUE ---- ----- appliedLastSeqno: 7 appliedLatency : 0.18 role : master serviceName : flow1 serviceType : local started : true state : ONLINE Finished services command... [tungsten@server1 tungsten]$ /tungsten-replicator/tungsten/tungsten-replicator/bin/trepctl -host server2.domain.com services Processing services command... NAME VALUE ---- ----- appliedLastSeqno: 7 appliedLatency : 0.0 role : slave serviceName : flow1 serviceType : local started : true state : ONLINE Finished services command... |
Rows inserted after the backup on server1.domain.com (id equals to 3 and 4) are well transferred to server2.domain.com and new inserted rows are correctly replicated…
Testing
You can check your cluster status and topology with show_cluster script:
[tungsten@server1 tungsten]$ ./cookbook/show_cluster This command requires an installed cluster [tungsten@server1 tungsten]$ pwd /tungsten-replicator/tungsten [tungsten@server1 tungsten]$ echo master_slave > CURRENT_TOPOLOGY [tungsten@server1 tungsten]$ ./cookbook/show_cluster -------------------------------------------------------------------------------------- Topology: 'MASTER_SLAVE' -------------------------------------------------------------------------------------- # node server1.domain.com flow1 [master] seqno: 10 - latency: 1.057 - ONLINE # node server2.domain.com flow1 [slave] seqno: 10 - latency: 0.625 - ONLINE |
Continuent is proposing a script to set all environment variables for you:
[tungsten@server2 tungsten-replicator]$ . /tungsten-replicator/share/env.sh |
To simulate an apply error I first insert a row on slave (this demonstrate why your slave should be in read only mode):
MariaDB [(none)]> insert into replicationdb.test1 values(6,'Six'); Query OK, 1 row affected (0.00 sec) |
Row that I then insert on master plus an additional one that will not be replicated because replication is going to fail as I have a primary key on id column:
MariaDB [(none)]> insert into replicationdb.test1 values(6,'Six'); Query OK, 1 row affected (0.00 sec) MariaDB [(none)]> insert into replicationdb.test1 values(7,'Seven'); Query OK, 1 row affected (0.02 sec) |
As expected:
[tungsten@server1 tungsten]$ ./cookbook/show_cluster -------------------------------------------------------------------------------------- Topology: 'MASTER_SLAVE' -------------------------------------------------------------------------------------- # node server1.domain.com flow1 [master] seqno: 8 - latency: 0.891 - ONLINE # node server2.domain.com flow1 [slave] seqno: -1 - latency: -1.000 - OFFLINE:ERROR [tungsten@server1 tungsten]$ trepctl -host server2 services -full Processing services command... NAME VALUE ---- ----- appliedLastEventId : NONE appliedLastSeqno : -1 appliedLatency : -1.0 channels : -1 clusterName : flow1 currentEventId : NONE currentTimeMillis : 1392816326784 dataServerHost : server2.domain.com extensions : host : server2.domain.com latestEpochNumber : -1 masterConnectUri : thl://server1.domain.com:2112/ masterListenUri : thl://server2.domain.com:2112/ maximumStoredSeqNo : -1 minimumStoredSeqNo : -1 offlineRequests : NONE pendingError : Event application failed: seqno=8 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master pendingErrorCode : NONE pendingErrorEventId : mysql-bin.000003:0000000000000951;176 pendingErrorSeqno : 8 pendingExceptionMessage: java.sql.SQLException: Statement failed on slave but succeeded on master insert into replicationdb.test1 values(6,'Six') pipelineSource : UNKNOWN relativeLatency : -1.0 resourcePrecedence : 99 rmiPort : 10000 role : slave seqnoType : java.lang.Long serviceName : flow1 serviceType : unknown simpleServiceName : flow1 siteName : default sourceId : server2.domain.com started : true state : OFFLINE:ERROR timeInStateSeconds : 64.565 transitioningTo : uptimeSeconds : 7887.022 useSSLConnection : false version : Tungsten Replicator 2.2.0 build 292 Finished services command... |
We see above that replication is stuck on slave on sequence 8 (pendingErrorSeqno : 8) that is inserting the already existing id equals to 6. Easy situation to solve:
[tungsten@server1 tungsten]$ trepctl -host server2 -service flow1 online -skip-seqno 8 [tungsten@server1 tungsten]$ trepctl -host server2 services Processing services command... NAME VALUE ---- ----- appliedLastSeqno: 9 appliedLatency : 224.992 role : slave serviceName : flow1 serviceType : local started : true state : ONLINE Finished services command... |
And row with id equals to 7 is correctly replicated… Of course it will not always be so easy but the background and commands are there to handle them…
Cleaning
You can do it manually (stop Tungsten, clean directories and drop Tungsten schema) or use the provided script:
[tungsten@server1 tungsten]$ pwd /tungsten-replicator/tungsten [tungsten@server1 tungsten]$ ./cookbook/clear_cluster -------------------------------------------------------------------------------------- Clearing up cluster with installed topology 'MASTER_SLAVE' -------------------------------------------------------------------------------------- !!! WARNING !!! -------------------------------------------------------------------------------------- 'clear-cluster' is a potentially damaging operation. This command will do all the following: * Stop the replication software in all servers. [$STOP_REPLICATORS] * REMOVE ALL THE CONTENTS from /tungsten-replicator/.[$REMOVE_TUNGSTEN_BASE] * REMOVE the tungsten_<service_name> schemas in all nodes (server1.domain.com server2.domain.com) [$REMOVE_SERVICE_SCHEMA] * REMOVE the schemas created for testing (test, evaluator) in all nodes (server1.domain.com server2.domain.com) [$REMOVE_TEST_SCHEMAS] * Create the test server anew; [$CLEAN_NODE_DATABASE_SERVER] * Unset the read_only variable; [$CLEAN_NODE_DATABASE_SERVER] * Set the binlog format to MIXED; [$CLEAN_NODE_DATABASE_SERVER] * Reset the master (removes all binary logs); [$CLEAN_NODE_DATABASE_SERVER] If this is what you want, either set the variable I_WANT_TO_UNINSTALL or answer 'y' to the question below You may also set the variables in brackets to fine tune the execution. Alternatively, have a look at ./cookbook/clear_cluster and customize it to your needs. -------------------------------------------------------------------------------------- Do you wish to uninstall this cluster? [y/n] y cat: /tungsten-replicator/tungsten/resultsserver1.domain.com: No such file or directory ./cookbook/clear_cluster: line 100: /tungsten-replicator/tungsten/clear_node_sql_server1.domain.com.sh: No such file or directory rm: cannot remove `/tungsten-replicator/tungsten/resultsserver1.domain.com': No such file or directory rm: cannot remove `/tungsten-replicator/tungsten/clear_node_server1.domain.com.sh': No such file or directory rm: cannot remove `/tungsten-replicator/tungsten/clear_node_sql_server1.domain.com.sh': No such file or directory cat: /tungsten-replicator/tungsten/resultsserver2.domain.com: No such file or directory ./cookbook/clear_cluster: line 100: /tungsten-replicator/tungsten/clear_node_sql_server2.domain.com.sh: No such file or directory rm: cannot remove `/tungsten-replicator/tungsten/resultsserver2.domain.com': No such file or directory rm: cannot remove `/tungsten-replicator/tungsten/clear_node_server2.domain.com.sh': No such file or directory rm: cannot remove `/tungsten-replicator/tungsten/clear_node_sql_server2.domain.com.sh': No such file or directory |
Remark:
Even if specified the tungsten_flow1 schema has not been dropped and I had to do it manually.
References
- Tungsten Replicator Manual
- tungsten-replicator
- Installing and Administering Tungsten Replicator – Part 1 – basics
- Installing and administering Tungsten Replicator – Part 2 : advanced
- Using Tungsten Replicator to solve replication problems