MySQL Utilities versus Master High Availability Manager and tools for MySQL (MHA)

Preamble

This post is a follow-up of MySQL Replication implementation I have described previously. The MySQL Replication tools used are:

  • MySQL Utilities 1.3.1 from Oracle corporation
  • MHA manager 0.55, MHA node 0.54 from Yoshinori Matsunobu

Blog post has been done using MySQL 5.6.11 (binary release i.e. Linux – Generic (glibc 2.5) (x86, 64-bit), Compressed TAR Archive), Oracle Enterprise Linux 6.4 64 bits. In below server1.domain.com is my master server and server2.domain.com the slave one. They are both virtual machine using a non routable IP.

MySQL Utilities

Installation

First start by downloading MySQL Utilities, I have taken the source code then Generic Linux (Architecture Independent), Compressed TAR Archive edition. Then installation is as simple as:

[root@server1 tmp]# gtar xvf mysql-utilities-1.3.1.tar.gz
[root@server1 tmp]# cd /tmp/mysql-utilities-1.3.1
[root@server1 mysql-utilities-1.3.1]# python setup.py --help-commands
[root@server1 mysql-utilities-1.3.1]# python setup.py build
[root@server1 mysql-utilities-1.3.1]# python setup.py install

You obviously need Python on your server (python-2.6.6-36.el6.x86_64 in mycase) and the Python MySQL library (mysql-connector-python-1.0.8-1.el6.noarch) as I got the below error:

ImportError: No module named connector

Usage

Checks replication prerequisites:

[mysql@server1 ~]$ mysqlrplcheck --master=yjaquier:secure_password@server1.domain.com:3326 --slave=yjaquier:secure_password@server2.domain.com:3326
# master on server1.domain.com: ... connected.
# slave on server2.domain.com: ... connected.
Test Description                                                     Status
---------------------------------------------------------------------------
Checking for binary logging on master                                [pass]
Are there binlog exceptions?                                         [pass]
Replication user exists?                                             [pass]
Checking server_id values                                            [pass]
Checking server_uuid values                                          [pass]
Is slave connected to master?                                        [pass]
Check master information file                                        [pass]
Checking InnoDB compatibility                                        [pass]
Checking storage engines compatibility                               [pass]
Checking lower_case_table_names settings                             [pass]
Checking slave delay (seconds behind master)                         [pass]
# ...done.

Display replication topology:

[mysql@server1 ~]$  mysqlrplshow --master=yjaquier:secure_password@server1.domain.com:3326 --discover-slaves-login=yjaquier:secure_password
# master on server1.domain.com: ... connected.
# Finding slaves for master: server1.domain.com:3326
 
# Replication Topology Graph
server1.domain.com:3326 (MASTER)
   |
   +--- server2.domain.com:3326 - (SLAVE)

Display replication health of the topology:

[mysql@server1 ~]$ mysqlrpladmin --master=yjaquier:secure_password@server1.domain.com:3326 --slaves=yjaquier:secure_password@server2.domain.com:3326 health
# Checking privileges.
#
# Replication Topology Health:
+---------------------+-------+---------+--------+------------+---------+
| host                | port  | role    | state  | gtid_mode  | health  |
+---------------------+-------+---------+--------+------------+---------+
| server1.domain.com  | 3326  | MASTER  | UP     | ON         | OK      |
| server2.domain.com  | 3326  | SLAVE   | UP     | ON         | OK      |
+---------------------+-------+---------+--------+------------+---------+
# ...done.

Interactive health monitoring and failover:

[mysql@server1 ~]$ mysqlfailover --master=yjaquier:secure_password@server1.domain.com:3326 --discover-slaves-login=yjaquier:secure_password --rediscover
MySQL Replication Failover Utility
Failover Mode = auto     Next Interval = Fri Jun  7 17:26:37 2013
 
Master Information
------------------
Binary Log File   Position  Binlog_Do_DB  Binlog_Ignore_DB
mysql-bin.000001  396
 
GTID Executed Set
770d3753-c6e4-11e2-8e78-080027d93e15:1
 
Replication Health Status
+---------------------+-------+---------+--------+------------+---------+
| host                | port  | role    | state  | gtid_mode  | health  |
+---------------------+-------+---------+--------+------------+---------+
| server1.domain.com  | 3326  | MASTER  | UP     | ON         | OK      |
| server2.domain.com  | 3326  | SLAVE   | UP     | ON         | OK      |
+---------------------+-------+---------+--------+------------+---------+
 
Q-quit R-refresh H-health G-GTID Lists U-UUIDs

Remark:
In this release (1.3.1) mysqlfailover tool is really CPU intensive so better not to forget it running in a terminal…

As a crash test if you stop master instance (clean shutdown or not), you get:

MySQL Replication Failover Utility
Failover Mode = auto     Next Interval = Fri Jun  7 17:34:49 2013
 
Master Information
------------------
Binary Log File   Position  Binlog_Do_DB  Binlog_Ignore_DB
mysql-bin.000001  396
 
GTID Executed Set
770d3753-c6e4-11e2-8e78-080027d93e15:1
 
Replication Health Status
+---------------------+-------+---------+--------+------------+---------+
| host                | port  | role    | state  | gtid_mode  | health  |
+---------------------+-------+---------+--------+------------+---------+
| server2.domain.com  | 3326  | MASTER  | UP     | ON         | OK      |
+---------------------+-------+---------+--------+------------+---------+

Then add it back with (you can also use CHANGE MASTER SQL command):

[mysql@server1 ~]$ mysqlreplicate --master=yjaquier:secure_password@server2.domain.com:3326 --slave=yjaquier:secure_password@server1.domain.com:3326 --rpl-user=repadmin:secure_password
# master on server2.domain.com: ... connected.
# slave on server1.domain.com: ... connected.
# Checking for binary logging on master...
# Setting up replication...
# ...done.

You now have:

MySQL Replication Failover Utility
Failover Mode = auto     Next Interval = Fri Jun  7 17:37:14 2013
 
Master Information
------------------
Binary Log File   Position  Binlog_Do_DB  Binlog_Ignore_DB
mysql-bin.000001  396
 
GTID Executed Set
770d3753-c6e4-11e2-8e78-080027d93e15:1
 
Replication Health Status
+---------------------+-------+---------+--------+------------+-----------------------------+
| host                | port  | role    | state  | gtid_mode  | health                      |
+---------------------+-------+---------+--------+------------+-----------------------------+
| server2.domain.com  | 3326  | MASTER  | UP     | ON         | OK                          |
| server1.domain.com  | 3326  | SLAVE   | UP     | ON         | SQL thread is not running.  |
+---------------------+-------+---------+--------+------------+-----------------------------+

So, strangely mysqlreplicate does not automatically start the slave SQL thread. So tried another MySQL Utilities tool:

[mysql@server1 ~]$ mysqlrpladmin --master=yjaquier:secure_password@server2.domain.com:3326 --slaves=yjaquier:secure_password@server1.domain.com:3326 start
# Checking privileges.
# Performing START on all slaves.
#   Executing start on slave server1.domain.com:3326 Ok
# ...done.

But not working 🙁 so basic START SLAVE SQL command… I’m wondering if mysqlreplicate use mysqlrpladmin in the background, that could explain why both tools fail to start slave SQL thread…

MySQL Replication Failover Utility
Failover Mode = auto     Next Interval = Fri Jun  7 17:49:37 2013
 
Master Information
------------------
Binary Log File   Position  Binlog_Do_DB  Binlog_Ignore_DB
mysql-bin.000001  396
 
GTID Executed Set
770d3753-c6e4-11e2-8e78-080027d93e15:1
 
Replication Health Status
+---------------------+-------+---------+--------+------------+---------+
| host                | port  | role    | state  | gtid_mode  | health  |
+---------------------+-------+---------+--------+------------+---------+
| server2.domain.com  | 3326  | MASTER  | UP     | ON         | OK      |
| server1.domain.com  | 3326  | SLAVE   | UP     | ON         | OK      |
+---------------------+-------+---------+--------+------------+---------+

Or:

[mysql@server1 ~]$ mysqlrplshow --master=yjaquier:secure_password@server2.domain.com:3326 --discover-slaves-login=yjaquier:secure_password
# master on server2.domain.com: ... connected.
# Finding slaves for master: server2.domain.com:3326
 
# Replication Topology Graph
server2.domain.com:3326 (MASTER)
   |
   +--- server1.domain.com:3326 - (SLAVE)

To perform a switchover to be back to initial configuration use:

[mysql@server1 ~]$ mysqlrpladmin --master=yjaquier:secure_password@server2.domain.com:3326 --slaves=yjaquier:secure_password@server1.domain.com:3326 \
> --new-master=yjaquier:secure_password@server1.domain.com:3326 --demote-master switchover
# Checking privileges.
# Performing switchover from master at server2.domain.com:3326 to slave at server1.domain.com:3326.
# Checking candidate slave prerequisites.
# Checking slaves configuration to master.
# Waiting for slaves to catch up to old master.
# Stopping slaves.
# Performing STOP on all slaves.
# Demoting old master to be a slave to the new master.
# Switching slaves to new master.
# Starting all slaves.
# Performing START on all slaves.
# Checking slaves for errors.
# Switchover complete.
#
# Replication Topology Health:
+---------------------+-------+---------+--------+------------+-----------------------------+
| host                | port  | role    | state  | gtid_mode  | health                      |
+---------------------+-------+---------+--------+------------+-----------------------------+
| server1.domain.com  | 3326  | MASTER  | UP     | ON         | OK                          |
| server2.domain.com  | 3326  | SLAVE   | UP     | ON         | SQL thread is not running.  |
+---------------------+-------+---------+--------+------------+-----------------------------+
# ...done.

START SLAVE SQL command has to be issued manually on server2.domain.com even if command output clearly mention that start slave has been successfully done…

Please note it is very easy to do mistake if messing-up master and slave server names. For example imagine that after a failover or a switchover server2.domain.com is the new master and you have completely forgotten or not followed it carefully. Then if you issue mysqlrpladmin command with wrong server name you may imagine MySQL Replication completely broken, while in fact you have just exchange server names::

[mysql@server1 ~]$ mysqlrpladmin --master=yjaquier:secure_password@server1.domain.com:3326 --slaves=yjaquier:secure_password@server2.domain.com:3326 health
# Checking privileges.
#
# Replication Topology Health:
+---------------------+-------+---------+--------+------------+------------------------------------+
| host                | port  | role    | state  | gtid_mode  | health                             |
+---------------------+-------+---------+--------+------------+------------------------------------+
| server1.domain.com  | 3326  | MASTER  | UP     | ON         | OK                                 |
| server2.domain.com  | 3326  | SLAVE   | WARN   |            | Slave is not connected to master.  |
+---------------------+-------+---------+--------+------------+------------------------------------+
# ...done.

MySQL Utilities errors you may encounter if not sing GTID feature. mysqlfailover works only with GTID mode on:

Topology must support global transaction ids and have GTID_MODE=ON.
2013-06-19 17:08:19 PM CRITICAL Topology must support global transaction ids and have GTID_MODE=ON.
ERROR: Topology must support global transaction ids and have GTID_MODE=ON.

mysqlrpladmin works without –rel-user option specified only if master_info_repository and relay_log_info_repository are set to table value:

ERROR: You must specify either the --rpl-user or set all slaves to use --master-info-repository=TABLE.

To solve it I had to add –rpl-user=repadmin:secure_password –force to switchover command, –force due to:

# When the master_info_repository variable is set to FILE, the --rpl-user option may be used only if the user specified matches what is shown in the SLAVE STATUS output unless the --force option is used.
ERROR: The specified replication user is using a different password that the one specified.
Use the --force option to force the use of the user specified with  --rpl-user and new password.

Master High Availability Manager and tools for MySQL (MHA)

Installation

MHA is made of two packages one called MHA Node and MHA Manager. MHA Node must be installed on all MySQL servers part of your architecture (masters and slaves) while MHA Manager should be installed on a separated server (quorum server), if you don’t want to do this for cost reason then:

If you have only one (master, slaves) pair, you may not like allocating dedicated hardware for MHA Manager because it adds relatively high costs. In such cases, running MHA Manager on one of slaves makes sense. Note that current version of MHA Manager connects to MySQL slave server via SSH even though the MySQL server is located on the same host as MHA Manager, so you need to enable SSH public key authentication from the same host.

If installing from source package installation (MHA Node and MHA Manager) is as simple as (I apparently had all needed libraries installed i.e. DBI & DBD::mysql):

perl Makefile.PL
make
sudo make install

MHA Manager installation failed for missing:

- Config::Tiny          ...missing.
- Log::Dispatch         ...missing.
- Parallel::ForkManager ...missing.

Automatic download did not worked, so had to manually install (yum):

perl-Config-Tiny.noarch
perl-Log-Dispatch.noarch (with lots of dependencies, thanks yum)
perl-Parallel-ForkManager.noarch

Then you have to configure a password less access between your server, root account preferred one. Start by generating a public (I chose DSA) on all servers:

[root@server2 .ssh]# ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/root/.ssh/id_dsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
a1:98:4d:f5:57:91:28:a0:75:20:4c:fb:05:fd:2b:d5 root@server2.domain.com
The key's randomart image is:
+--[ DSA 1024]----+
|     oo *+.  .oo |
|      .* +o ...  |
|      + . oo..   |
|     = o o .o E  |
|    o o S  . .   |
|          . .    |
|           .     |
|                 |
|                 |
+-----------------+
[root@server2 .ssh]# pwd
/root/.ssh
[root@server2 .ssh]# ll -a
total 28
drwx------.  2 root root 4096 Jun 17 12:50 .
dr-xr-x---. 22 root root 4096 Jun 17 10:50 ..
-rw-------   1 root root  668 Jun 17 12:50 id_dsa
-rw-r--r--   1 root root  613 Jun 17 12:50 id_dsa.pub
-rw-r--r--.  1 root root 2321 Jun 17 12:45 known_hosts

Make sure to uncomment or add below line in /etc/ssh/sshd_config and restart sshd daemon on all servers:

AuthorizedKeysFile      .ssh/authorized_keys

Club all public key files in a common authorized_keys file that you will distribute to all servers (in /root/.ssh directory).

Usage

First thing to do is create an application local configuration file (I have only one application to monitor so global configuration not needed):

[server default]
# mysql user and password
user=yjaquier
password=secure_password
port=3326
# working directory on the manager
manager_workdir=/tmp
# manager log file
manager_log=/tmp/app1.log
# working directory on MySQL servers
remote_workdir=/tmp

[server1]
hostname=server1.domain.com
master_binlog_dir=/mysql/logs/5.6.11

[server2]
hostname=server2.domain.com
master_binlog_dir=/mysql/logs/5.6.11

Then check ssh access for MHA between your servers:

[root@server2 ~]# masterha_check_ssh --conf=/etc/app1.cnf
Mon Jun 17 12:56:19 2013 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jun 17 12:56:19 2013 - [info] Reading application default configurations from /etc/app1.cnf..
Mon Jun 17 12:56:19 2013 - [info] Reading server configurations from /etc/app1.cnf..
Mon Jun 17 12:56:19 2013 - [info] Starting SSH connection tests..
Mon Jun 17 12:56:19 2013 - [debug]
Mon Jun 17 12:56:19 2013 - [debug]  Connecting via SSH from root@server1.domain.com(192.168.56.101:22) to root@server2.domain.com(192.168.56.102:22)..
Mon Jun 17 12:56:19 2013 - [debug]   ok.
Mon Jun 17 12:56:20 2013 - [debug]
Mon Jun 17 12:56:19 2013 - [debug]  Connecting via SSH from root@server2.domain.com(192.168.56.102:22) to root@server1.domain.com(192.168.56.101:22)..
Mon Jun 17 12:56:19 2013 - [debug]   ok.
Mon Jun 17 12:56:20 2013 - [info] All SSH connection tests passed successfully.

When issuing below command to check replication status:

[root@server2 ~]# masterha_check_ssh --conf=/etc/app1.cnf

I first had below error:

Mon Jun 17 18:02:06 2013 - [error][/usr/local/share/perl5/MHA/Server.pm, ln241]  Getting relay log directory or current relay logfile from replication table failed on server1.domain.com(192.168.56.101:3326)!
Mon Jun 17 18:02:06 2013 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln386] Error happend on checking configurations.  at /usr/local/share/perl5/MHA/ServerManager.pm line 269
Mon Jun 17 18:02:06 2013 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Mon Jun 17 18:02:06 2013 - [info] Got exit code 1 (Not master dead).

Had to change (MySQL Replication information table mode not supported yet):

master_info_repository = file
relay_log_info_repository = file

I then had another error (I’m using root Linux account):

Can't exec "mysqlbinlog": No such file or directory at /usr/local/share/perl5/MHA/BinlogManager.pm line 99.
mysqlbinlog version not found!

I did not see any option to specify MySQL directory (I’m not using default filesystems) but looks like people are waiting for it. Changing root profile to add MySQL binary directory in path did not help… So created a symbolic link in /usr/bin directory on all servers:

[root@server2 bin]# pwd
/usr/bin
[root@server2 bin]# ln -s /mysql/software/5.6.11/bin/mysqlbinlog mysqlbinlog
[root@server2 bin]# ll mysqlbinlog
lrwxrwxrwx 1 root root 38 Jun 19 10:38 mysqlbinlog -> /mysql/software/5.6.11/bin/mysqlbinlog

Then failed for:

Testing mysql connection and privileges..sh: mysql: command not found

So also created a symbolic link for mysql binary.

To run MHA Manager use:

[root@server2 ~]# masterha_manager --conf=/etc/app1.cnf
Wed Jun 19 10:50:06 2013 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jun 19 10:50:06 2013 - [info] Reading application default configurations from /etc/app1.cnf..
Wed Jun 19 10:50:06 2013 - [info] Reading server configurations from /etc/app1.cnf..

Or:

[root@server2 ~]# nohup masterha_manager --conf=/etc/app1.cnf < /dev/null > /tmp/app1_manager.log 2>&1 &

Then you can check your MySQL Replication status with:

[root@server2 ~]# masterha_check_status --conf=/etc/app1.cnf
app1 (pid:28816) is running(0:PING_OK), master:server1.domain.com

When trying to perform a master switch I can see one first limitation of running MHA manager on slave server:

[root@server2 ~]# masterha_master_switch --master_state=alive --conf=/etc/app1.cnf --new_master_host=server2.domain.com
.
Wed Jun 19 14:27:36 2013 - [error][/usr/local/share/perl5/MHA/MasterRotate.pm, ln142] Getting advisory lock failed on the current master. MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again.
.

So decided to test a failover, kill mysql processes on master server (server1.domain.com in my case). After a while you may interactively see (that would not occur on a dedicated MHA manager host):

[1]+  Done                    nohup masterha_manager --conf=/etc/app1.cnf < /dev/null > /tmp/app1_manager.log 2>&1  (wd: /usr/bin)
(wd now: /tmp)

And see an app1.failover.complete empty file in working directory you configured (/tmp for me) and found in application log (/tmp/app1.log for me), output obviously truncated:

.
.
----- Failover Report -----

app1: MySQL Master failover server1.domain.com to server2.domain.com succeeded

Master server1.domain.com is down!

Check MHA Manager logs at server2.domain.com:/tmp/app1.log for details.

Started automated(non-interactive) failover.
The latest slave server2.domain.com(192.168.56.102:3326) has all relay logs for recovery.
Selected server2.domain.com as a new master.
server2.domain.com: OK: Applying all logs succeeded.
Generating relay diff files from the latest slave succeeded.
server2.domain.com: Resetting slave info succeeded.
Master failover to server2.domain.com(192.168.56.102:3326) completed successfully.

How to move MySQL Replication from GTID to standard mode

On slave:

mysql> stop slave;
Query OK, 0 rows affected (0.06 sec)
 
mysql> reset slave all;
Query OK, 0 rows affected (0.00 sec)
 
mysql> show slave status\G
Empty set (0.00 sec)

On master to start with a clean situation:

mysql> reset master;
Query OK, 0 rows affected (0.04 sec)
 
mysql> show master logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql-bin.000001 |       120 |
+------------------+-----------+
1 row in set (0.00 sec)

Then add standard MySQL replication with:

mysql> change master to master_host='server1.domain.com', master_port=3326, master_user='repadmin', master_password='secure_password', \
       master_log_file='mysql-bin.000001', master_log_pos=120;
ERROR 1776 (HY000): Parameters MASTER_LOG_FILE, MASTER_LOG_POS, RELAY_LOG_FILE and RELAY_LOG_POS cannot be set when MASTER_AUTO_POSITION is active.

I changed gtid_mode value to off (stop/start of both MySQL instances as a static parameter) and tried to recreate replication, but still failing for same error (!!):

mysql> show variables like '%gtid%';
+--------------------------+-----------+
| Variable_name            | Value     |
+--------------------------+-----------+
| enforce_gtid_consistency | ON        |
| gtid_executed            |           |
| gtid_mode                | OFF       |
| gtid_next                | AUTOMATIC |
| gtid_owned               |           |
| gtid_purged              |           |
+--------------------------+-----------+
6 rows in set (0.01 sec)
 
mysql> change master to master_host='server1.domain.com', master_port=3326, master_user='repadmin', master_password='secure_password', \
       master_log_file='mysql-bin.000001', master_log_pos=120;
ERROR 1777 (HY000): CHANGE MASTER TO MASTER_AUTO_POSITION = 1 can only be executed when @@GLOBAL.GTID_MODE = ON.
 
mysql> change master to master_host='server1.domain.com', master_port=3326, master_user='repadmin', master_password='secure_password', \
       master_log_file='mysql-bin.000001', master_log_pos=120, master_auto_position=0;
ERROR 1777 (HY000): CHANGE MASTER TO MASTER_AUTO_POSITION = 1 can only be executed when @@GLOBAL.GTID_MODE = ON.

Had to reissue on slave (!!):

mysql> reset slave all;
Query OK, 0 rows affected, 5 warnings (0.08 sec)

And finally:

mysql> change master to master_host='server1.domain.com', master_port=3326, master_user='repadmin', master_password='secure_password', \
       master_log_file='mysql-bin.000001', master_log_pos=120;
Query OK, 0 rows affected, 2 warnings (0.07 sec)
 
mysql> start slave;
Query OK, 0 rows affected (0.01 sec)

Conclusion

While I expected to compare the two products I reached the conclusion that they are not usable on same perimeter:

  • MySQL Utilities still beta (as of writing of this post) and clearly few things are missing. Its main target is MySQL 5.6 as all is around new GTID feature, many of the tools are working with standard replication implementation but I have found few errors.
  • MHA has not yet been released for GTID but coming soon (surely lots of people waiting for it). Few annoying things still in loop (path to MySQL binaries for example). Support from SkySQL for MHA if platinum contract is clearly a plus. On standard MySQL Replication implementation MHA is much in advanced and has proven to work in a real life big example (DeNA company claims 700+ MySQL servers with more than 150 {master, slaves} pairs, mainly MySQL 5.0 and 5.1).

References

2 thoughts on “MySQL Utilities versus Master High Availability Manager and tools for MySQL (MHA)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>