Table of contents
Preamble
When tuning your MariaDB (or MySQL) InnoDB memory parameters you have some obvious parameter related to memory that are highly important like:
- innodb_buffer_pool_size
- innodb_log_buffer_size
Some others are more linked to the underlining I/O subsystem you have like:
- innodb_flush_method
- innodb_io_capacity_max
- innodb_io_capacity
- innodb_lru_flush_size
The ones I have seen discussed in plenty of articles are innodb_io_capacity_max and innodb_io_capacity. Most of the time their value are picked from nowhere so let see on how to set them using either fio or the couple sysbench/iostat. Even if I initially rated fio more convenient I finally had a better preference for the couple sysbench/iostat that are really more closer to real database workload.
Disclaimer:
I have done my test on different system so the results cannot be compared…
Fio
To ease my runs I create a Fio configuration file with below parameters:
runtime=1800 # 30mins size=1g filename=/mariadb/data01/mariadb01/fio name=fiojob threads=1 numjobs=16 ioengine=libaio # Linux native asynchronous I/O blocksize=16k # Matching InooDB block size lockfile=readwrite # Read-write locking on the file. Many readers may access the file at the same time, but writes get exclusive access readwrite=randwrite # Random writes # true = 1, false = 0 direct = 1 group_reporting # Display statistics for groups of jobs as a whole instead of for each individual job |
Remark:
Buffered I/O
For this test direct is set to false so we use FileSystem cache, this explain the high IOPS result:
[mariadb@server1 ~]$ fio fio.cnf fiojob: (g=0): rw=randwrite, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=1 ... fio-3.19 Starting 16 processes Jobs: 15 (f=15) fiojob: (groupid=0, jobs=16): err= 0: pid=17259: Mon Jan 15 15:59:28 2024 write: IOPS=81.7k, BW=1277MiB/s (1339MB/s)(1600MiB/1253msec); 0 zone resets slat (usec): min=6, max=158697, avg=118.57, stdev=3129.06 clat (nsec): min=1442, max=75674k, avg=4222.74, stdev=313215.24 lat (usec): min=7, max=158703, avg=124.02, stdev=3151.53 clat percentiles (nsec): | 1.00th=[ 1464], 5.00th=[ 1464], 10.00th=[ 1480], | 20.00th=[ 1496], 30.00th=[ 1576], 40.00th=[ 1624], | 50.00th=[ 1672], 60.00th=[ 1736], 70.00th=[ 1800], | 80.00th=[ 1832], 90.00th=[ 1976], 95.00th=[ 2096], | 99.00th=[ 12224], 99.50th=[ 13632], 99.90th=[ 59648], | 99.95th=[ 74240], 99.99th=[4292608] bw ( MiB/s): min= 1028, max= 1337, per=92.65%, avg=1183.02, stdev=22.77, samples=23 iops : min=65811, max=85601, avg=75706.00, stdev=1457.26, samples=23 lat (usec) : 2=91.52%, 4=7.01%, 10=0.31%, 20=0.82%, 50=0.18% lat (usec) : 100=0.14%, 250=0.01%, 500=0.01%, 750=0.01% lat (msec) : 2=0.01%, 10=0.01%, 20=0.01%, 100=0.01% cpu : usr=2.95%, sys=9.94%, ctx=891, majf=0, minf=222 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,102400,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=1277MiB/s (1339MB/s), 1277MiB/s-1277MiB/s (1339MB/s-1339MB/s), io=1600MiB (1678MB), run=1253-1253msec Disk stats (read/write): dm-4: ios=0/594, merge=0/0, ticks=0/2023, in_queue=2023, util=21.18%, aggrios=0/617, aggrmerge=0/1, aggrticks=0/1394, aggrin_queue=1394, aggrutil=18.28% sda: ios=0/617, merge=0/1, ticks=0/1394, in_queue=1394, util=18.28% |
Non buffered I/O
For this test direct is set to true to bypass FileSystem cache, IOPS result are more closer to a normal system handling plenty of IO and not able to cope with all of them in cache::
fiojob: (g=0): rw=randwrite, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=1 ... fio-3.19 Starting 16 processes Jobs: 16 (f=16): [w(16)][100.0%][w=187MiB/s][w=11.0k IOPS][eta 00m:00s] fiojob: (groupid=0, jobs=16): err= 0: pid=18867: Mon Jan 15 16:55:36 2024 write: IOPS=12.2k, BW=190MiB/s (199MB/s)(1600MiB/8410msec); 0 zone resets slat (usec): min=10, max=117947, avg=125.24, stdev=459.57 clat (usec): min=2, max=94096, avg=1147.85, stdev=961.22 lat (usec): min=103, max=117953, avg=1274.20, stdev=1054.06 clat percentiles (usec): | 1.00th=[ 3], 5.00th=[ 4], 10.00th=[ 269], 20.00th=[ 619], | 30.00th=[ 783], 40.00th=[ 922], 50.00th=[ 1057], 60.00th=[ 1221], | 70.00th=[ 1385], 80.00th=[ 1598], 90.00th=[ 1942], 95.00th=[ 2343], | 99.00th=[ 3785], 99.50th=[ 4424], 99.90th=[ 6783], 99.95th=[11994], | 99.99th=[23987] bw ( KiB/s): min=168385, max=210815, per=100.00%, avg=195774.69, stdev=707.44, samples=256 iops : min=10521, max=13174, avg=12232.12, stdev=44.25, samples=256 lat (usec) : 4=7.89%, 10=0.23%, 20=0.12%, 50=0.02%, 100=0.36% lat (usec) : 250=1.16%, 500=5.18%, 750=12.92%, 1000=17.67% lat (msec) : 2=45.54%, 4=8.09%, 10=0.76%, 20=0.04%, 50=0.01% lat (msec) : 100=0.01% cpu : usr=0.48%, sys=6.02%, ctx=98834, majf=0, minf=239 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,102400,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=190MiB/s (199MB/s), 190MiB/s-190MiB/s (199MB/s-199MB/s), io=1600MiB (1678MB), run=8410-8410msec Disk stats (read/write): dm-4: ios=0/101915, merge=0/0, ticks=0/31993, in_queue=31993, util=97.99%, aggrios=0/102468, aggrmerge=0/98, aggrticks=0/31179, aggrin_queue=31179, aggrutil=98.04% sda: ios=0/102468, merge=0/98, ticks=0/31179, in_queue=31179, util=98.04% |
Iostat
One of the first drawbacks to overcome with iostat is the lack of loading option like we have with Fio. Iostat is a pure monitoring tool and does not include any benchmark part. To supplement iostat I have decided to use the tool suggested by MariaDB that is called sysbench. It’s easy to install as you find it in RedHat repository. On the opposite with sysbench you really issue a real database workload so it might be more realistic than just testing the underlining filesystem…
Sysbench
Sysbench is part of EPFL repository so its installation is a as easy a configuring this repository on your host and using dnf to install it.
Prepare the required MariaDB database and account:
MariaDB [(none)]> create database sbtest; Query OK, 1 row affected (0.001 sec) MariaDB [(none)]> create user sbuser@localhost identified by 'secret_password'; Query OK, 0 rows affected (0.001 sec) MariaDB [(none)]> grant all on sbtest.* to sbuser@localhost; Query OK, 0 rows affected (0.001 sec) |
Prepare the benchmark tables with (here below a simple example with 10 tables of 100000 in size but setup it based on available space you have):
[mariadb@server1 ~]$ sysbench --db-driver=mysql --oltp-tables-count=10 --oltp-table-size=100000 /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua \ --mysql-host=localhost --mysql-socket=/mariadb/software/mariadb01/conf/mariadb01.sock --mysql-user=sbuser --mysql-password=secret_password prepare sysbench: /usr/pgsql-15/lib/libpq.so.5: no version information available (required by sysbench) sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3) Creating table 'sbtest1'... Inserting 100000 records into 'sbtest1' Creating secondary indexes on 'sbtest1'... Creating table 'sbtest2'... Inserting 100000 records into 'sbtest2' Creating secondary indexes on 'sbtest2'... Creating table 'sbtest3'... Inserting 100000 records into 'sbtest3' Creating secondary indexes on 'sbtest3'... Creating table 'sbtest4'... Inserting 100000 records into 'sbtest4' Creating secondary indexes on 'sbtest4'... Creating table 'sbtest5'... Inserting 100000 records into 'sbtest5' Creating secondary indexes on 'sbtest5'... Creating table 'sbtest6'... Inserting 100000 records into 'sbtest6' Creating secondary indexes on 'sbtest6'... Creating table 'sbtest7'... Inserting 100000 records into 'sbtest7' Creating secondary indexes on 'sbtest7'... Creating table 'sbtest8'... Inserting 100000 records into 'sbtest8' Creating secondary indexes on 'sbtest8'... Creating table 'sbtest9'... Inserting 100000 records into 'sbtest9' Creating secondary indexes on 'sbtest9'... Creating table 'sbtest10'... Inserting 100000 records into 'sbtest10' Creating secondary indexes on 'sbtest10'... |
Run the benchmark with something like (tune the threads, events, –oltp-tables-count and –oltp-table-size based on what you have prepared and based on the hardware you have):
[mariadb@server1 ~]$ sysbench --db-driver=mysql --threads=24 --events=1000000 --oltp-tables-count=30 --oltp-table-size=1000000 --oltp-test-mode=complex \ --oltp-dist-type=uniform /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua --mysql-host=localhost --mysql-socket=/mariadb/software/mariadb01/conf/mariadb01.sock \ --mysql-user=sbuser --mysql-password=secret_password --time=120 --report-interval=10 run sysbench: /usr/pgsql-15/lib/libpq.so.5: no version information available (required by sysbench) sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3) Running the test with following options: Number of threads: 24 Report intermediate results every 10 second(s) Initializing random number generator from current time Initializing worker threads... Threads started! [ 10s ] thds: 24 tps: 805.25 qps: 16138.86 (r/w/o: 11303.47/3222.50/1612.90) lat (ms,95%): 80.03 err/s: 0.00 reconn/s: 0.00 [ 20s ] thds: 24 tps: 776.52 qps: 15526.91 (r/w/o: 10867.69/3106.18/1553.04) lat (ms,95%): 77.19 err/s: 0.00 reconn/s: 0.00 [ 30s ] thds: 24 tps: 754.98 qps: 15097.51 (r/w/o: 10568.66/3018.90/1509.95) lat (ms,95%): 92.42 err/s: 0.00 reconn/s: 0.00 [ 40s ] thds: 24 tps: 946.54 qps: 18924.02 (r/w/o: 13245.37/3785.56/1893.08) lat (ms,95%): 53.85 err/s: 0.00 reconn/s: 0.00 [ 50s ] thds: 24 tps: 524.50 qps: 10492.24 (r/w/o: 7344.33/2098.91/1049.00) lat (ms,95%): 200.47 err/s: 0.00 reconn/s: 0.00 [ 60s ] thds: 24 tps: 1029.77 qps: 20593.58 (r/w/o: 14415.14/4119.00/2059.45) lat (ms,95%): 39.65 err/s: 0.00 reconn/s: 0.00 [ 70s ] thds: 24 tps: 942.22 qps: 18847.55 (r/w/o: 13192.95/3770.17/1884.44) lat (ms,95%): 52.89 err/s: 0.00 reconn/s: 0.00 [ 80s ] thds: 24 tps: 973.72 qps: 19478.28 (r/w/o: 13637.37/3893.38/1947.54) lat (ms,95%): 40.37 err/s: 0.00 reconn/s: 0.00 [ 90s ] thds: 24 tps: 1023.38 qps: 20472.83 (r/w/o: 14330.64/4095.43/2046.76) lat (ms,95%): 39.65 err/s: 0.00 reconn/s: 0.00 [ 100s ] thds: 24 tps: 998.09 qps: 19955.27 (r/w/o: 13967.71/3991.37/1996.19) lat (ms,95%): 54.83 err/s: 0.00 reconn/s: 0.00 [ 110s ] thds: 24 tps: 1038.99 qps: 20783.15 (r/w/o: 14548.63/4156.55/2077.98) lat (ms,95%): 44.17 err/s: 0.00 reconn/s: 0.00 [ 120s ] thds: 24 tps: 1005.96 qps: 20112.81 (r/w/o: 14078.35/4022.54/2011.92) lat (ms,95%): 36.24 err/s: 0.00 reconn/s: 0.00 SQL statistics: queries performed: read: 1515178 write: 432908 other: 216454 total: 2164540 transactions: 108227 (901.73 per sec.) queries: 2164540 (18034.60 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 120.0173s total number of events: 108227 Latency (ms): min: 3.73 avg: 26.61 max: 1012.31 95th percentile: 58.92 sum: 2879679.63 Threads fairness: events (avg/stddev): 4509.4583/33.00 execution time (avg/stddev): 119.9867/0.00 |
While sysbench is running issue your iostat command in another session, the list of parameter I have decided to use is (5 * 24 = 120 seconds):
[root@server1 ~]# iostat -h -z -H -x -g ALL -d 5 24 Linux 4.18.0-372.19.1.el8_6.x86_64 (server1) 01/22/2024 _x86_64_ (24 CPU) r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 1.56 6.95 17.8k 271.4k 0.00 0.47 0.0% 6.4% 0.34 0.18 0.00 11.4k 39.0k 0.30 0.0% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5168.00 2384.80 80.7M 75.5M 0.00 0.20 0.0% 0.0% 0.99 0.50 6.32 16.0k 32.4k 0.26 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 4692.00 2713.60 73.3M 76.8M 0.00 0.00 0.0% 0.0% 1.01 0.54 6.19 16.0k 29.0k 0.26 1.2% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5033.00 2554.60 78.6M 77.8M 0.00 0.20 0.0% 0.0% 1.06 0.55 6.76 16.0k 31.2k 0.26 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 2904.40 2124.80 45.4M 56.3M 0.00 0.00 0.0% 0.0% 1.26 0.58 4.88 16.0k 27.1k 0.34 1.1% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 3544.00 3366.20 55.4M 74.4M 0.00 0.60 0.0% 0.0% 0.87 0.46 4.62 16.0k 22.6k 0.28 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 4744.20 3356.40 74.1M 90.1M 0.00 1.40 0.0% 0.0% 0.65 0.49 4.73 16.0k 27.5k 0.24 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 4418.20 3280.60 69.0M 86.8M 0.00 6.60 0.0% 0.2% 0.68 0.47 4.55 16.0k 27.1k 0.25 1.2% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 4960.80 3046.20 77.5M 88.2M 0.00 0.20 0.0% 0.0% 0.75 0.52 5.31 16.0k 29.6k 0.24 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 903.60 4839.40 14.1M 128.6M 0.00 0.20 0.0% 0.0% 0.69 0.40 2.58 16.0k 27.2k 0.34 1.2% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 4848.20 2589.80 75.7M 82.4M 0.00 0.00 0.0% 0.0% 0.70 0.52 4.74 16.0k 32.6k 0.26 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5472.60 1601.00 85.5M 56.4M 0.00 0.20 0.0% 0.0% 0.57 0.53 3.99 16.0k 36.1k 0.28 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 4883.60 2486.40 76.3M 76.2M 0.00 0.20 0.0% 0.0% 0.64 0.61 4.65 16.0k 31.4k 0.27 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 4747.20 3188.20 74.2M 96.4M 0.00 0.20 0.0% 0.0% 0.62 0.50 4.57 16.0k 31.0k 0.25 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5460.80 2888.80 85.3M 91.5M 0.00 0.00 0.0% 0.0% 0.59 0.52 4.70 16.0k 32.4k 0.24 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 4927.60 2875.40 77.0M 90.7M 0.00 0.80 0.0% 0.0% 0.57 0.51 4.29 16.0k 32.3k 0.25 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5048.40 2824.80 78.9M 87.3M 0.00 0.00 0.0% 0.0% 0.67 0.54 4.92 16.0k 31.6k 0.25 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5395.40 2724.00 84.3M 84.8M 0.00 0.20 0.0% 0.0% 0.62 0.52 4.78 16.0k 31.9k 0.24 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5058.60 2901.60 79.0M 91.2M 0.00 1.20 0.0% 0.0% 0.68 0.62 5.24 16.0k 32.2k 0.24 1.2% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5170.20 3202.20 80.8M 98.1M 0.00 0.60 0.0% 0.0% 0.67 0.58 5.32 16.0k 31.4k 0.24 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5248.20 2882.40 82.0M 90.6M 0.00 0.00 0.0% 0.0% 0.66 0.59 5.19 16.0k 32.2k 0.24 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5487.40 3018.60 85.7M 95.3M 0.00 0.40 0.0% 0.0% 0.61 0.58 5.08 16.0k 32.3k 0.23 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5406.80 3175.40 84.5M 99.1M 0.00 0.00 0.0% 0.0% 0.63 0.56 5.17 16.0k 31.9k 0.23 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5390.80 2605.20 84.2M 82.4M 0.00 0.20 0.0% 0.0% 0.61 0.54 4.73 16.0k 32.4k 0.25 1.3% ALL |
The IOPS is the sum of read and writre per second (so r/s + w/s). So I get a maximum of 8582 IOPS and an average of 7632 IOPS (you can copy/paste your figures in Excel for faster computation). Needless to say that using Fio is much easier that sysbench + iostat couple…
Remardk:
With -p option you can filter on the disk you want to monitor…
innodb_io_capacity and innodb_io_capacity_max
While you have determined your your IOPS capacity the golden rule is to set this value to innodb_io_capacity_max and to set innodb_io_capacity to around 75% of this value.
What I would personally do then is an iterative process and relaunch a bunch of sysbench/iostat commands to fine tune the value of innodb_io_capacity_max/innodb_io_capacity. At least to try to reach a stable situation and have the maximum value for those two parameter and get the best possible result with sysbench/iostat.
Huge Pages
Set the number of Huge Pages high enough to accomodate the size of innodb_buffer_pool_size:
[root@server1 ~]# sysctl vm.nr_hugepages=600 vm.nr_hugepages = 600 [root@server1 ~]# cat /proc/meminfo | grep ^HugePages HugePages_Total: 600 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 |
Change large_pages variables to set it to ON:
MariaDB [(none)]> show variables like 'large_pages'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | large_pages | ON | +---------------+-------+ 1 row in set (0.001 sec) |
Then bounce your MariaDB instance and you should see your Huge Pages occupied:
[root@server1 ~]# cat /proc/meminfo | grep ^HugePages HugePages_Total: 600 HugePages_Free: 426 HugePages_Rsvd: 421 HugePages_Surp: 0 |
Or with xsos that has been recently shared to me by a system teammate (I love it):
Remark:
The occupation is allocated as the instance warm up this is why you cannot do the math when you have just started your MariaDB instance…
But does it improve the situation ? I have issued the exact same sysbench and iostat commands on the exact same hardware as just above…
[mariadb@server1 ~]$ sysbench --db-driver=mysql --threads=24 --events=1000000 --oltp-tables-count=30 --oltp-table-size=1000000 --oltp-test-mode=complex --oltp-dist-type=uniform \ /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua --mysql-host=localhost --mysql-socket=/mariadb/software/mariadb01/conf/mariadb01.sock --mysql-user=sbuser \ --mysql-password=secret_password --time=120 --report-interval=10 run sysbench: /usr/pgsql-15/lib/libpq.so.5: no version information available (required by sysbench) sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3) Running the test with following options: Number of threads: 24 Report intermediate results every 10 second(s) Initializing random number generator from current time Initializing worker threads... Threads started! [ 10s ] thds: 24 tps: 857.60 qps: 17175.69 (r/w/o: 12026.89/3431.20/1717.60) lat (ms,95%): 57.87 err/s: 0.00 reconn/s: 0.00 [ 20s ] thds: 24 tps: 1012.15 qps: 20237.52 (r/w/o: 14165.44/4047.78/2024.29) lat (ms,95%): 36.89 err/s: 0.00 reconn/s: 0.00 [ 30s ] thds: 24 tps: 1029.96 qps: 20605.04 (r/w/o: 14424.50/4120.63/2059.91) lat (ms,95%): 36.24 err/s: 0.00 reconn/s: 0.00 [ 40s ] thds: 24 tps: 1028.45 qps: 20565.81 (r/w/o: 14395.44/4113.58/2056.79) lat (ms,95%): 34.95 err/s: 0.00 reconn/s: 0.00 [ 50s ] thds: 24 tps: 1022.37 qps: 20450.69 (r/w/o: 14316.94/4088.90/2044.85) lat (ms,95%): 38.94 err/s: 0.00 reconn/s: 0.00 [ 60s ] thds: 24 tps: 1018.42 qps: 20365.08 (r/w/o: 14254.34/4073.90/2036.85) lat (ms,95%): 35.59 err/s: 0.00 reconn/s: 0.00 [ 70s ] thds: 24 tps: 1028.90 qps: 20582.74 (r/w/o: 14409.33/4115.61/2057.80) lat (ms,95%): 36.89 err/s: 0.00 reconn/s: 0.00 [ 80s ] thds: 24 tps: 1002.63 qps: 20048.89 (r/w/o: 14032.79/4010.94/2005.17) lat (ms,95%): 38.94 err/s: 0.00 reconn/s: 0.00 [ 90s ] thds: 24 tps: 1003.26 qps: 20071.35 (r/w/o: 14051.47/4013.25/2006.62) lat (ms,95%): 36.24 err/s: 0.00 reconn/s: 0.00 [ 100s ] thds: 24 tps: 989.97 qps: 19792.34 (r/w/o: 13852.64/3959.77/1979.93) lat (ms,95%): 37.56 err/s: 0.00 reconn/s: 0.00 [ 110s ] thds: 24 tps: 978.41 qps: 19570.02 (r/w/o: 13699.15/3914.04/1956.82) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00 [ 120s ] thds: 24 tps: 992.98 qps: 19853.62 (r/w/o: 13896.04/3971.72/1985.86) lat (ms,95%): 36.24 err/s: 0.00 reconn/s: 0.00 SQL statistics: queries performed: read: 1675492 write: 478712 other: 239356 total: 2393560 transactions: 119678 (997.13 per sec.) queries: 2393560 (19942.65 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 120.0179s total number of events: 119678 Latency (ms): min: 3.94 avg: 24.06 max: 629.69 95th percentile: 38.25 sum: 2879699.14 Threads fairness: events (avg/stddev): 4986.5833/23.74 execution time (avg/stddev): 119.9875/0.01 |
[root@server1 ~]# iostat -h -z -H -x -g ALL -d 5 24 Linux 4.18.0-372.19.1.el8_6.x86_64 (server1) 01/22/2024 _x86_64_ (24 CPU) r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 1.88 7.14 22.8k 277.0k 0.00 0.47 0.0% 6.2% 0.39 0.18 0.00 12.2k 38.8k 0.30 0.0% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5696.60 575.00 89.0M 21.1M 0.00 0.60 0.0% 0.1% 0.61 0.41 3.71 16.0k 37.6k 0.31 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 4801.00 2641.00 75.0M 83.2M 0.00 0.00 0.0% 0.0% 0.60 0.57 4.41 16.0k 32.2k 0.26 1.2% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5432.00 2421.80 84.9M 78.0M 0.00 0.20 0.0% 0.0% 0.55 0.58 4.40 16.0k 33.0k 0.25 1.2% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5477.80 2339.20 85.6M 74.0M 0.00 0.00 0.0% 0.0% 0.55 0.56 4.32 16.0k 32.4k 0.25 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5451.40 2744.40 85.1M 85.4M 0.00 0.20 0.0% 0.0% 0.58 0.53 4.62 16.0k 31.9k 0.24 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5238.40 2529.80 81.8M 80.7M 0.00 7.60 0.0% 0.3% 0.55 0.56 4.29 16.0k 32.7k 0.25 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5289.80 2606.20 82.6M 82.9M 0.00 1.60 0.0% 0.1% 0.54 0.56 4.30 16.0k 32.6k 0.25 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5396.40 2702.80 84.3M 84.7M 0.00 0.00 0.0% 0.0% 0.53 0.55 4.34 16.0k 32.1k 0.24 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5311.60 2727.00 83.0M 86.4M 0.00 0.20 0.0% 0.0% 0.52 0.54 4.26 16.0k 32.4k 0.25 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5258.20 2809.40 82.1M 88.2M 0.00 0.00 0.0% 0.0% 0.55 0.58 4.53 16.0k 32.1k 0.24 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5388.00 2721.80 84.2M 85.4M 0.00 0.20 0.0% 0.0% 0.58 0.55 4.63 16.0k 32.1k 0.24 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5211.60 2667.20 81.4M 83.9M 0.00 0.40 0.0% 0.0% 0.61 0.55 4.65 16.0k 32.2k 0.25 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5366.80 2580.40 83.8M 82.0M 0.00 0.20 0.0% 0.0% 0.56 0.54 4.37 16.0k 32.6k 0.25 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 4896.40 2536.60 76.5M 79.7M 0.00 0.00 0.0% 0.0% 0.63 0.68 4.78 16.0k 32.2k 0.26 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5244.20 2739.60 81.9M 86.7M 0.00 0.20 0.0% 0.0% 0.57 0.54 4.48 16.0k 32.4k 0.25 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5249.40 2695.00 82.0M 84.0M 0.00 0.20 0.0% 0.0% 0.57 0.53 4.42 16.0k 31.9k 0.25 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5239.20 2998.40 81.8M 94.2M 0.00 0.40 0.0% 0.0% 0.55 0.53 4.45 16.0k 32.2k 0.24 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5228.80 2787.40 81.7M 88.1M 0.00 0.40 0.0% 0.0% 0.54 0.52 4.31 16.0k 32.4k 0.25 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 4985.60 2457.60 77.9M 77.9M 0.00 1.20 0.0% 0.0% 0.55 0.53 4.06 16.0k 32.5k 0.27 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 4997.00 2595.00 78.1M 80.2M 0.00 0.80 0.0% 0.0% 0.57 0.54 4.24 16.0k 31.7k 0.26 1.2% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 4962.80 2557.20 77.5M 79.1M 0.00 0.20 0.0% 0.0% 0.56 0.55 4.21 16.0k 31.7k 0.26 1.2% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5368.00 2529.60 83.8M 80.7M 0.00 0.40 0.0% 0.0% 0.50 0.54 4.04 16.0k 32.7k 0.25 1.3% ALL r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util Device 5294.80 2508.80 82.7M 79.8M 0.00 0.20 0.0% 0.0% 0.48 0.50 3.81 16.0k 32.6k 0.25 1.3% ALL |
I now get a maximum of 8237 IOPS and an average of 7793 IOPS… Even if in no way this is a real benchmark I have done multiple run and even if there is no huge difference in order of IOPS magnitude I have been able to observe more consistency in Sysbench result. The latency max is lower and the 95th percentile is more consistent. I still recall that on Oracle activating Huge Pages was really a key features, apparently on MariaDB it is also something pretty interesting…
References
- Testing for Filesystem-Block Sizes Below 1 BS with Fio
- Open Source Databases on Big Machines: Disk Speed and innodb_io_capacity
- Working with MySQL on SSD
- MariaDB Huge Pages