innodb_io_capacity and innodb_io_capacity_max tuning

Preamble

When tuning your MariaDB (or MySQL) InnoDB memory parameters you have some obvious parameter related to memory that are highly important like:

  • innodb_buffer_pool_size
  • innodb_log_buffer_size

Some others are more linked to the underlining I/O subsystem you have like:

  • innodb_flush_method
  • innodb_io_capacity_max
  • innodb_io_capacity
  • innodb_lru_flush_size

The ones I have seen discussed in plenty of articles are innodb_io_capacity_max and innodb_io_capacity. Most of the time their value are picked from nowhere so let see on how to set them using either fio or the couple sysbench/iostat. Even if I initially rated fio more convenient I finally had a better preference for the couple sysbench/iostat that are really more closer to real database workload.

Disclaimer:
I have done my test on different system so the results cannot be compared…

Fio

To ease my runs I create a Fio configuration file with below parameters:

runtime=1800 # 30mins
size=1g
filename=/mariadb/data01/mariadb01/fio
name=fiojob
threads=1
numjobs=16
ioengine=libaio       # Linux native asynchronous I/O
blocksize=16k         # Matching InooDB block size
lockfile=readwrite    # Read-write locking on the file. Many readers may access the file at the same time, but writes get exclusive access
readwrite=randwrite   # Random writes
# true = 1, false = 0
direct = 1
group_reporting       # Display statistics for groups of jobs as a whole instead of for each individual job

Remark:

Buffered I/O

For this test direct is set to false so we use FileSystem cache, this explain the high IOPS result:

[mariadb@server1 ~]$ fio fio.cnf
fiojob: (g=0): rw=randwrite, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=1
...
fio-3.19
Starting 16 processes
Jobs: 15 (f=15)
fiojob: (groupid=0, jobs=16): err= 0: pid=17259: Mon Jan 15 15:59:28 2024
  write: IOPS=81.7k, BW=1277MiB/s (1339MB/s)(1600MiB/1253msec); 0 zone resets
    slat (usec): min=6, max=158697, avg=118.57, stdev=3129.06
    clat (nsec): min=1442, max=75674k, avg=4222.74, stdev=313215.24
     lat (usec): min=7, max=158703, avg=124.02, stdev=3151.53
    clat percentiles (nsec):
     |  1.00th=[   1464],  5.00th=[   1464], 10.00th=[   1480],
     | 20.00th=[   1496], 30.00th=[   1576], 40.00th=[   1624],
     | 50.00th=[   1672], 60.00th=[   1736], 70.00th=[   1800],
     | 80.00th=[   1832], 90.00th=[   1976], 95.00th=[   2096],
     | 99.00th=[  12224], 99.50th=[  13632], 99.90th=[  59648],
     | 99.95th=[  74240], 99.99th=[4292608]
   bw (  MiB/s): min= 1028, max= 1337, per=92.65%, avg=1183.02, stdev=22.77, samples=23
   iops        : min=65811, max=85601, avg=75706.00, stdev=1457.26, samples=23
  lat (usec)   : 2=91.52%, 4=7.01%, 10=0.31%, 20=0.82%, 50=0.18%
  lat (usec)   : 100=0.14%, 250=0.01%, 500=0.01%, 750=0.01%
  lat (msec)   : 2=0.01%, 10=0.01%, 20=0.01%, 100=0.01%
  cpu          : usr=2.95%, sys=9.94%, ctx=891, majf=0, minf=222
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,102400,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
 
Run status group 0 (all jobs):
  WRITE: bw=1277MiB/s (1339MB/s), 1277MiB/s-1277MiB/s (1339MB/s-1339MB/s), io=1600MiB (1678MB), run=1253-1253msec
 
Disk stats (read/write):
    dm-4: ios=0/594, merge=0/0, ticks=0/2023, in_queue=2023, util=21.18%, aggrios=0/617, aggrmerge=0/1, aggrticks=0/1394, aggrin_queue=1394, aggrutil=18.28%
  sda: ios=0/617, merge=0/1, ticks=0/1394, in_queue=1394, util=18.28%

Non buffered I/O

For this test direct is set to true to bypass FileSystem cache, IOPS result are more closer to a normal system handling plenty of IO and not able to cope with all of them in cache::

fiojob: (g=0): rw=randwrite, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=1
...
fio-3.19
Starting 16 processes
Jobs: 16 (f=16): [w(16)][100.0%][w=187MiB/s][w=11.0k IOPS][eta 00m:00s]
fiojob: (groupid=0, jobs=16): err= 0: pid=18867: Mon Jan 15 16:55:36 2024
  write: IOPS=12.2k, BW=190MiB/s (199MB/s)(1600MiB/8410msec); 0 zone resets
    slat (usec): min=10, max=117947, avg=125.24, stdev=459.57
    clat (usec): min=2, max=94096, avg=1147.85, stdev=961.22
     lat (usec): min=103, max=117953, avg=1274.20, stdev=1054.06
    clat percentiles (usec):
     |  1.00th=[    3],  5.00th=[    4], 10.00th=[  269], 20.00th=[  619],
     | 30.00th=[  783], 40.00th=[  922], 50.00th=[ 1057], 60.00th=[ 1221],
     | 70.00th=[ 1385], 80.00th=[ 1598], 90.00th=[ 1942], 95.00th=[ 2343],
     | 99.00th=[ 3785], 99.50th=[ 4424], 99.90th=[ 6783], 99.95th=[11994],
     | 99.99th=[23987]
   bw (  KiB/s): min=168385, max=210815, per=100.00%, avg=195774.69, stdev=707.44, samples=256
   iops        : min=10521, max=13174, avg=12232.12, stdev=44.25, samples=256
  lat (usec)   : 4=7.89%, 10=0.23%, 20=0.12%, 50=0.02%, 100=0.36%
  lat (usec)   : 250=1.16%, 500=5.18%, 750=12.92%, 1000=17.67%
  lat (msec)   : 2=45.54%, 4=8.09%, 10=0.76%, 20=0.04%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=0.48%, sys=6.02%, ctx=98834, majf=0, minf=239
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,102400,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
 
Run status group 0 (all jobs):
  WRITE: bw=190MiB/s (199MB/s), 190MiB/s-190MiB/s (199MB/s-199MB/s), io=1600MiB (1678MB), run=8410-8410msec
 
Disk stats (read/write):
    dm-4: ios=0/101915, merge=0/0, ticks=0/31993, in_queue=31993, util=97.99%, aggrios=0/102468, aggrmerge=0/98, aggrticks=0/31179, aggrin_queue=31179, aggrutil=98.04%
  sda: ios=0/102468, merge=0/98, ticks=0/31179, in_queue=31179, util=98.04%

Iostat

One of the first drawbacks to overcome with iostat is the lack of loading option like we have with Fio. Iostat is a pure monitoring tool and does not include any benchmark part. To supplement iostat I have decided to use the tool suggested by MariaDB that is called sysbench. It’s easy to install as you find it in RedHat repository. On the opposite with sysbench you really issue a real database workload so it might be more realistic than just testing the underlining filesystem…

Sysbench

Sysbench is part of EPFL repository so its installation is a as easy a configuring this repository on your host and using dnf to install it.

Prepare the required MariaDB database and account:

MariaDB [(none)]> create database sbtest;
Query OK, 1 row affected (0.001 sec)
 
MariaDB [(none)]> create user sbuser@localhost identified by 'secret_password';
Query OK, 0 rows affected (0.001 sec)
 
MariaDB [(none)]> grant all on sbtest.* to sbuser@localhost;
Query OK, 0 rows affected (0.001 sec)

Prepare the benchmark tables with (here below a simple example with 10 tables of 100000 in size but setup it based on available space you have):

[mariadb@server1 ~]$ sysbench --db-driver=mysql --oltp-tables-count=10 --oltp-table-size=100000 /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua \
--mysql-host=localhost --mysql-socket=/mariadb/software/mariadb01/conf/mariadb01.sock --mysql-user=sbuser --mysql-password=secret_password prepare
sysbench: /usr/pgsql-15/lib/libpq.so.5: no version information available (required by sysbench)
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
 
Creating table 'sbtest1'...
Inserting 100000 records into 'sbtest1'
Creating secondary indexes on 'sbtest1'...
Creating table 'sbtest2'...
Inserting 100000 records into 'sbtest2'
Creating secondary indexes on 'sbtest2'...
Creating table 'sbtest3'...
Inserting 100000 records into 'sbtest3'
Creating secondary indexes on 'sbtest3'...
Creating table 'sbtest4'...
Inserting 100000 records into 'sbtest4'
Creating secondary indexes on 'sbtest4'...
Creating table 'sbtest5'...
Inserting 100000 records into 'sbtest5'
Creating secondary indexes on 'sbtest5'...
Creating table 'sbtest6'...
Inserting 100000 records into 'sbtest6'
Creating secondary indexes on 'sbtest6'...
Creating table 'sbtest7'...
Inserting 100000 records into 'sbtest7'
Creating secondary indexes on 'sbtest7'...
Creating table 'sbtest8'...
Inserting 100000 records into 'sbtest8'
Creating secondary indexes on 'sbtest8'...
Creating table 'sbtest9'...
Inserting 100000 records into 'sbtest9'
Creating secondary indexes on 'sbtest9'...
Creating table 'sbtest10'...
Inserting 100000 records into 'sbtest10'
Creating secondary indexes on 'sbtest10'...

Run the benchmark with something like (tune the threads, events, –oltp-tables-count and –oltp-table-size based on what you have prepared and based on the hardware you have):

[mariadb@server1 ~]$ sysbench --db-driver=mysql --threads=24 --events=1000000 --oltp-tables-count=30 --oltp-table-size=1000000 --oltp-test-mode=complex \
--oltp-dist-type=uniform /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua --mysql-host=localhost --mysql-socket=/mariadb/software/mariadb01/conf/mariadb01.sock \
--mysql-user=sbuser --mysql-password=secret_password --time=120 --report-interval=10 run
sysbench: /usr/pgsql-15/lib/libpq.so.5: no version information available (required by sysbench)
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
 
Running the test with following options:
Number of threads: 24
Report intermediate results every 10 second(s)
Initializing random number generator from current time
 
 
Initializing worker threads...
 
Threads started!
 
[ 10s ] thds: 24 tps: 805.25 qps: 16138.86 (r/w/o: 11303.47/3222.50/1612.90) lat (ms,95%): 80.03 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 24 tps: 776.52 qps: 15526.91 (r/w/o: 10867.69/3106.18/1553.04) lat (ms,95%): 77.19 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 24 tps: 754.98 qps: 15097.51 (r/w/o: 10568.66/3018.90/1509.95) lat (ms,95%): 92.42 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 24 tps: 946.54 qps: 18924.02 (r/w/o: 13245.37/3785.56/1893.08) lat (ms,95%): 53.85 err/s: 0.00 reconn/s: 0.00
[ 50s ] thds: 24 tps: 524.50 qps: 10492.24 (r/w/o: 7344.33/2098.91/1049.00) lat (ms,95%): 200.47 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: 24 tps: 1029.77 qps: 20593.58 (r/w/o: 14415.14/4119.00/2059.45) lat (ms,95%): 39.65 err/s: 0.00 reconn/s: 0.00
[ 70s ] thds: 24 tps: 942.22 qps: 18847.55 (r/w/o: 13192.95/3770.17/1884.44) lat (ms,95%): 52.89 err/s: 0.00 reconn/s: 0.00
[ 80s ] thds: 24 tps: 973.72 qps: 19478.28 (r/w/o: 13637.37/3893.38/1947.54) lat (ms,95%): 40.37 err/s: 0.00 reconn/s: 0.00
[ 90s ] thds: 24 tps: 1023.38 qps: 20472.83 (r/w/o: 14330.64/4095.43/2046.76) lat (ms,95%): 39.65 err/s: 0.00 reconn/s: 0.00
[ 100s ] thds: 24 tps: 998.09 qps: 19955.27 (r/w/o: 13967.71/3991.37/1996.19) lat (ms,95%): 54.83 err/s: 0.00 reconn/s: 0.00
[ 110s ] thds: 24 tps: 1038.99 qps: 20783.15 (r/w/o: 14548.63/4156.55/2077.98) lat (ms,95%): 44.17 err/s: 0.00 reconn/s: 0.00
[ 120s ] thds: 24 tps: 1005.96 qps: 20112.81 (r/w/o: 14078.35/4022.54/2011.92) lat (ms,95%): 36.24 err/s: 0.00 reconn/s: 0.00
SQL statistics:
    queries performed:
        read:                            1515178
        write:                           432908
        other:                           216454
        total:                           2164540
    transactions:                        108227 (901.73 per sec.)
    queries:                             2164540 (18034.60 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)
 
General statistics:
    total time:                          120.0173s
    total number of events:              108227
 
Latency (ms):
         min:                                    3.73
         avg:                                   26.61
         max:                                 1012.31
         95th percentile:                       58.92
         sum:                              2879679.63
 
Threads fairness:
    events (avg/stddev):           4509.4583/33.00
    execution time (avg/stddev):   119.9867/0.00

While sysbench is running issue your iostat command in another session, the list of parameter I have decided to use is (5 * 24 = 120 seconds):

[root@server1 ~]# iostat -h -z -H -x -g ALL -d 5 24
Linux 4.18.0-372.19.1.el8_6.x86_64 (server1)  01/22/2024      _x86_64_        (24 CPU)
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
    1.56    6.95     17.8k    271.4k     0.00     0.47   0.0%   6.4%    0.34    0.18   0.00    11.4k    39.0k   0.30   0.0%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5168.00 2384.80     80.7M     75.5M     0.00     0.20   0.0%   0.0%    0.99    0.50   6.32    16.0k    32.4k   0.26   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 4692.00 2713.60     73.3M     76.8M     0.00     0.00   0.0%   0.0%    1.01    0.54   6.19    16.0k    29.0k   0.26   1.2%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5033.00 2554.60     78.6M     77.8M     0.00     0.20   0.0%   0.0%    1.06    0.55   6.76    16.0k    31.2k   0.26   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 2904.40 2124.80     45.4M     56.3M     0.00     0.00   0.0%   0.0%    1.26    0.58   4.88    16.0k    27.1k   0.34   1.1%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 3544.00 3366.20     55.4M     74.4M     0.00     0.60   0.0%   0.0%    0.87    0.46   4.62    16.0k    22.6k   0.28   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 4744.20 3356.40     74.1M     90.1M     0.00     1.40   0.0%   0.0%    0.65    0.49   4.73    16.0k    27.5k   0.24   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 4418.20 3280.60     69.0M     86.8M     0.00     6.60   0.0%   0.2%    0.68    0.47   4.55    16.0k    27.1k   0.25   1.2%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 4960.80 3046.20     77.5M     88.2M     0.00     0.20   0.0%   0.0%    0.75    0.52   5.31    16.0k    29.6k   0.24   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
  903.60 4839.40     14.1M    128.6M     0.00     0.20   0.0%   0.0%    0.69    0.40   2.58    16.0k    27.2k   0.34   1.2%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 4848.20 2589.80     75.7M     82.4M     0.00     0.00   0.0%   0.0%    0.70    0.52   4.74    16.0k    32.6k   0.26   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5472.60 1601.00     85.5M     56.4M     0.00     0.20   0.0%   0.0%    0.57    0.53   3.99    16.0k    36.1k   0.28   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 4883.60 2486.40     76.3M     76.2M     0.00     0.20   0.0%   0.0%    0.64    0.61   4.65    16.0k    31.4k   0.27   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 4747.20 3188.20     74.2M     96.4M     0.00     0.20   0.0%   0.0%    0.62    0.50   4.57    16.0k    31.0k   0.25   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5460.80 2888.80     85.3M     91.5M     0.00     0.00   0.0%   0.0%    0.59    0.52   4.70    16.0k    32.4k   0.24   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 4927.60 2875.40     77.0M     90.7M     0.00     0.80   0.0%   0.0%    0.57    0.51   4.29    16.0k    32.3k   0.25   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5048.40 2824.80     78.9M     87.3M     0.00     0.00   0.0%   0.0%    0.67    0.54   4.92    16.0k    31.6k   0.25   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5395.40 2724.00     84.3M     84.8M     0.00     0.20   0.0%   0.0%    0.62    0.52   4.78    16.0k    31.9k   0.24   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5058.60 2901.60     79.0M     91.2M     0.00     1.20   0.0%   0.0%    0.68    0.62   5.24    16.0k    32.2k   0.24   1.2%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5170.20 3202.20     80.8M     98.1M     0.00     0.60   0.0%   0.0%    0.67    0.58   5.32    16.0k    31.4k   0.24   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5248.20 2882.40     82.0M     90.6M     0.00     0.00   0.0%   0.0%    0.66    0.59   5.19    16.0k    32.2k   0.24   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5487.40 3018.60     85.7M     95.3M     0.00     0.40   0.0%   0.0%    0.61    0.58   5.08    16.0k    32.3k   0.23   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5406.80 3175.40     84.5M     99.1M     0.00     0.00   0.0%   0.0%    0.63    0.56   5.17    16.0k    31.9k   0.23   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5390.80 2605.20     84.2M     82.4M     0.00     0.20   0.0%   0.0%    0.61    0.54   4.73    16.0k    32.4k   0.25   1.3%  ALL

The IOPS is the sum of read and writre per second (so r/s + w/s). So I get a maximum of 8582 IOPS and an average of 7632 IOPS (you can copy/paste your figures in Excel for faster computation). Needless to say that using Fio is much easier that sysbench + iostat couple…

Remardk:
With -p option you can filter on the disk you want to monitor…

innodb_io_capacity and innodb_io_capacity_max

While you have determined your your IOPS capacity the golden rule is to set this value to innodb_io_capacity_max and to set innodb_io_capacity to around 75% of this value.

What I would personally do then is an iterative process and relaunch a bunch of sysbench/iostat commands to fine tune the value of innodb_io_capacity_max/innodb_io_capacity. At least to try to reach a stable situation and have the maximum value for those two parameter and get the best possible result with sysbench/iostat.

Huge Pages

Set the number of Huge Pages high enough to accomodate the size of innodb_buffer_pool_size:

[root@server1 ~]# sysctl vm.nr_hugepages=600
vm.nr_hugepages = 600
[root@server1 ~]# cat /proc/meminfo | grep ^HugePages
HugePages_Total:     600
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0

Change large_pages variables to set it to ON:

MariaDB [(none)]> show variables like 'large_pages';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| large_pages   | ON    |
+---------------+-------+
1 row in set (0.001 sec)

Then bounce your MariaDB instance and you should see your Huge Pages occupied:

[root@server1 ~]# cat /proc/meminfo | grep ^HugePages
HugePages_Total:     600
HugePages_Free:      426
HugePages_Rsvd:      421
HugePages_Surp:        0

Or with xsos that has been recently shared to me by a system teammate (I love it):

xsos01
xsos01

Remark:
The occupation is allocated as the instance warm up this is why you cannot do the math when you have just started your MariaDB instance…

But does it improve the situation ? I have issued the exact same sysbench and iostat commands on the exact same hardware as just above…

[mariadb@server1 ~]$ sysbench --db-driver=mysql --threads=24 --events=1000000 --oltp-tables-count=30 --oltp-table-size=1000000 --oltp-test-mode=complex --oltp-dist-type=uniform \
/usr/share/sysbench/tests/include/oltp_legacy/oltp.lua --mysql-host=localhost --mysql-socket=/mariadb/software/mariadb01/conf/mariadb01.sock --mysql-user=sbuser \
--mysql-password=secret_password --time=120 --report-interval=10 run
sysbench: /usr/pgsql-15/lib/libpq.so.5: no version information available (required by sysbench)
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
 
Running the test with following options:
Number of threads: 24
Report intermediate results every 10 second(s)
Initializing random number generator from current time
 
 
Initializing worker threads...
 
Threads started!
 
[ 10s ] thds: 24 tps: 857.60 qps: 17175.69 (r/w/o: 12026.89/3431.20/1717.60) lat (ms,95%): 57.87 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 24 tps: 1012.15 qps: 20237.52 (r/w/o: 14165.44/4047.78/2024.29) lat (ms,95%): 36.89 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 24 tps: 1029.96 qps: 20605.04 (r/w/o: 14424.50/4120.63/2059.91) lat (ms,95%): 36.24 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 24 tps: 1028.45 qps: 20565.81 (r/w/o: 14395.44/4113.58/2056.79) lat (ms,95%): 34.95 err/s: 0.00 reconn/s: 0.00
[ 50s ] thds: 24 tps: 1022.37 qps: 20450.69 (r/w/o: 14316.94/4088.90/2044.85) lat (ms,95%): 38.94 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: 24 tps: 1018.42 qps: 20365.08 (r/w/o: 14254.34/4073.90/2036.85) lat (ms,95%): 35.59 err/s: 0.00 reconn/s: 0.00
[ 70s ] thds: 24 tps: 1028.90 qps: 20582.74 (r/w/o: 14409.33/4115.61/2057.80) lat (ms,95%): 36.89 err/s: 0.00 reconn/s: 0.00
[ 80s ] thds: 24 tps: 1002.63 qps: 20048.89 (r/w/o: 14032.79/4010.94/2005.17) lat (ms,95%): 38.94 err/s: 0.00 reconn/s: 0.00
[ 90s ] thds: 24 tps: 1003.26 qps: 20071.35 (r/w/o: 14051.47/4013.25/2006.62) lat (ms,95%): 36.24 err/s: 0.00 reconn/s: 0.00
[ 100s ] thds: 24 tps: 989.97 qps: 19792.34 (r/w/o: 13852.64/3959.77/1979.93) lat (ms,95%): 37.56 err/s: 0.00 reconn/s: 0.00
[ 110s ] thds: 24 tps: 978.41 qps: 19570.02 (r/w/o: 13699.15/3914.04/1956.82) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 120s ] thds: 24 tps: 992.98 qps: 19853.62 (r/w/o: 13896.04/3971.72/1985.86) lat (ms,95%): 36.24 err/s: 0.00 reconn/s: 0.00
SQL statistics:
    queries performed:
        read:                            1675492
        write:                           478712
        other:                           239356
        total:                           2393560
    transactions:                        119678 (997.13 per sec.)
    queries:                             2393560 (19942.65 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)
 
General statistics:
    total time:                          120.0179s
    total number of events:              119678
 
Latency (ms):
         min:                                    3.94
         avg:                                   24.06
         max:                                  629.69
         95th percentile:                       38.25
         sum:                              2879699.14
 
Threads fairness:
    events (avg/stddev):           4986.5833/23.74
    execution time (avg/stddev):   119.9875/0.01
[root@server1 ~]# iostat -h -z -H -x -g ALL -d 5 24
Linux 4.18.0-372.19.1.el8_6.x86_64 (server1)  01/22/2024      _x86_64_        (24 CPU)
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
    1.88    7.14     22.8k    277.0k     0.00     0.47   0.0%   6.2%    0.39    0.18   0.00    12.2k    38.8k   0.30   0.0%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5696.60  575.00     89.0M     21.1M     0.00     0.60   0.0%   0.1%    0.61    0.41   3.71    16.0k    37.6k   0.31   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 4801.00 2641.00     75.0M     83.2M     0.00     0.00   0.0%   0.0%    0.60    0.57   4.41    16.0k    32.2k   0.26   1.2%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5432.00 2421.80     84.9M     78.0M     0.00     0.20   0.0%   0.0%    0.55    0.58   4.40    16.0k    33.0k   0.25   1.2%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5477.80 2339.20     85.6M     74.0M     0.00     0.00   0.0%   0.0%    0.55    0.56   4.32    16.0k    32.4k   0.25   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5451.40 2744.40     85.1M     85.4M     0.00     0.20   0.0%   0.0%    0.58    0.53   4.62    16.0k    31.9k   0.24   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5238.40 2529.80     81.8M     80.7M     0.00     7.60   0.0%   0.3%    0.55    0.56   4.29    16.0k    32.7k   0.25   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5289.80 2606.20     82.6M     82.9M     0.00     1.60   0.0%   0.1%    0.54    0.56   4.30    16.0k    32.6k   0.25   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5396.40 2702.80     84.3M     84.7M     0.00     0.00   0.0%   0.0%    0.53    0.55   4.34    16.0k    32.1k   0.24   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5311.60 2727.00     83.0M     86.4M     0.00     0.20   0.0%   0.0%    0.52    0.54   4.26    16.0k    32.4k   0.25   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5258.20 2809.40     82.1M     88.2M     0.00     0.00   0.0%   0.0%    0.55    0.58   4.53    16.0k    32.1k   0.24   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5388.00 2721.80     84.2M     85.4M     0.00     0.20   0.0%   0.0%    0.58    0.55   4.63    16.0k    32.1k   0.24   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5211.60 2667.20     81.4M     83.9M     0.00     0.40   0.0%   0.0%    0.61    0.55   4.65    16.0k    32.2k   0.25   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5366.80 2580.40     83.8M     82.0M     0.00     0.20   0.0%   0.0%    0.56    0.54   4.37    16.0k    32.6k   0.25   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 4896.40 2536.60     76.5M     79.7M     0.00     0.00   0.0%   0.0%    0.63    0.68   4.78    16.0k    32.2k   0.26   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5244.20 2739.60     81.9M     86.7M     0.00     0.20   0.0%   0.0%    0.57    0.54   4.48    16.0k    32.4k   0.25   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5249.40 2695.00     82.0M     84.0M     0.00     0.20   0.0%   0.0%    0.57    0.53   4.42    16.0k    31.9k   0.25   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5239.20 2998.40     81.8M     94.2M     0.00     0.40   0.0%   0.0%    0.55    0.53   4.45    16.0k    32.2k   0.24   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5228.80 2787.40     81.7M     88.1M     0.00     0.40   0.0%   0.0%    0.54    0.52   4.31    16.0k    32.4k   0.25   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 4985.60 2457.60     77.9M     77.9M     0.00     1.20   0.0%   0.0%    0.55    0.53   4.06    16.0k    32.5k   0.27   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 4997.00 2595.00     78.1M     80.2M     0.00     0.80   0.0%   0.0%    0.57    0.54   4.24    16.0k    31.7k   0.26   1.2%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 4962.80 2557.20     77.5M     79.1M     0.00     0.20   0.0%   0.0%    0.56    0.55   4.21    16.0k    31.7k   0.26   1.2%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5368.00 2529.60     83.8M     80.7M     0.00     0.40   0.0%   0.0%    0.50    0.54   4.04    16.0k    32.7k   0.25   1.3%  ALL
 
     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
 5294.80 2508.80     82.7M     79.8M     0.00     0.20   0.0%   0.0%    0.48    0.50   3.81    16.0k    32.6k   0.25   1.3%  ALL

I now get a maximum of 8237 IOPS and an average of 7793 IOPS… Even if in no way this is a real benchmark I have done multiple run and even if there is no huge difference in order of IOPS magnitude I have been able to observe more consistency in Sysbench result. The latency max is lower and the 95th percentile is more consistent. I still recall that on Oracle activating Huge Pages was really a key features, apparently on MariaDB it is also something pretty interesting…

References

About Post Author

Share the knowledge!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>