AIX Oracle tuning

Preamble

By default, AIX is tuned for a mixed workload. AIX Virtual Memory Manager (AIX VMM) will grow its VMM file cache up to 80% of physical RAM (AIX 6.1 default parameters are much in line with an Oracle database than the default one of AIX 5.3). This default configuration answer well to applications like NFS server or web server, but not very well to applications which do their own cache management. As you guess this includes most databases flavor (Oracle, MySQL with InnoDB, DB2, PostgreSQL).

As written in any white papers best solution is to use raw logical volumes, raw devices or more simply Oracle ASM but it requires more skilled people to manage it. If you do not accept to loose the flexibility of filesystems then consider using direct I/O (JFS) or concurrent I/O (JFS2) (with drawbacks listed below) and/or tune the AIX file cache to be less aggressive.

Environment variables

On top of the classic AIX environment variables:

export ORACLE_BASE=/ora320
export ORACLE_HOME=$ORACLE_BASE/software
export ORACLE_SID=db1
export TNS_ADMIN=$ORACLE_HOME/network/admin
export LIBPATH=$ORACLE_HOME/lib32:$ORACLE_HOME/lib
export PATH=$PATH:$ORACLE_HOME/bin:$ORACLE_HOME/OPatch
alias sql="sqlplus "/ as sysdba ""

Do not forget to add this important environment variable, it is clearly described in Oracle official documentation but it’s very easy to miss it:

export AIXTHREAD_SCOPE=S

From Oracle official documentation:

  • Threads in AIX can run with process-wide contention scope (M:N or P mode, default value) or with systemwide contention scope (1:1 or S mode). The AIXTHREAD_SCOPE environment variable controls which contention scope is used.
  • When using process-wide contention scope, Oracle threads are mapped to a pool of kernel threads. When Oracle is waiting on an event and its thread is swapped out, it may return on a different kernel thread with a different thread ID. Oracle uses the thread ID to post waiting processes, so it is important for the thread ID to remain the same.
  • When using systemwide contention scope, Oracle threads are mapped to kernel threads statically, one to one.

Oracle applications are primarily multiprocess and single-threaded that’s why Oracle recommends that you use systemwide contention (S mode). It would have been more logic to say than having a pool of kernel threads to serve all user threads that are coming is more efficient (mutualizing resources in a way) but apparently this is true only for multi-threaded application like Java for example. But what about the Oracle parallel processes ? Are they forked from Query Coordinator ? I suppose not, looking at the Oracle suggestion…

Remark:
On AIX 5L version 5.2 or later, systemwide contention scope (S mode) has also the added value of reducing allocated memory of each Oracle process.

Maximum number of processes

There is only one kernel parameter to set on AIX: the maximum number of processes per Unix account. If your number of parallel server is big and if you have multiple connection through the listener the default value of maxuproc is probably too low.

If it is too low you may also hit the ORA-04030: out of process memory when trying to allocate 127000 bytes (QERHJ hash-joi,kllcqas:kllsltba) error (see Oracle Support Note:Client Connection Fails with ORA-04030 (QERHJ hash-joi,kllcqas:kllsltba) [ID 470604.1].

To display the maxuproc value:

server1{root}# lsattr -E -l sys0 -a maxuproc
maxuproc 128 Maximum number of PROCESSES allowed per user True

To change it:

server1{root}# chdev -l sys0 -a maxuproc=8192
sys0 changed
server1{root}# lsattr -E -l sys0 -a maxuproc
maxuproc 8192 Maximum number of PROCESSES allowed per user True

VMM tuning

VMM background

As on all Unix flavor, AIX has its own Virtual Memory Manager (VMM) that is a mix of disk and physical memory. In AIX, all virtual memory segments (bigger than real physical memory) are partitioned into pages (4KB size by default). Allocated pages can be either physical memory (RAM) or paging space (virtual memory stored on disk). To display available swap and its usage use:

server1{root}# swap -l
device              maj,min        total       free
/dev/hd6              10,  2     24576MB     24544MB
server1{root}# lsps -a
Page Space      Physical Volume   Volume Group    Size %Used Active  Auto  Type
hd6             hdisk1            rootvg       24576MB     1   yes   yes    lv
server1{root}# lsps -s
Total Paging Space   Percent Used
      24576MB               1%

AIX classifies memory pages into two categories:

  • Working storage pages (process data, shared memory, …) also called computational pages.
  • Permanent storage pages (files cached in memory: client pages and OS file system cache) also called non-computational pages.

After a running period AIX will need to free memory, either file cache or working storage. To achieve this VMM is using a page replacement algorithm through lrud process (ps -k). Two 2 basic tunable parameter control this algorithm: minperm% and maxperm%, they act like pctfree and pctused in an Oracle block and control how VMM handle non-computational pages.

maxperm% is the upper limit (non-strict i.e. VMM can decide to go above this limit, making it strict can be done by setting strict_maxperm = 1 but it is not advisable) of non-computational pages in memory. minperm% set the lower limit on non-computational pages in memory.

numperm% is the percentage of non-computational pages in memory (can be obtained with vmstat -v, see memory monitoring chapter).

When numperm% is above maxperm% VMM steals non-computational pages and when numperm% is below minperm% VMM steals computational and non-computational pages.

When numperm% is between numperm% and maxperm%, lru_file_repage tunable parameter determine how VMM steals page. When lru_file_repage = 1 VMM steals either non-computational pages or non-computational and computational pages based on paging activity (VMM maintain a list of pages that are often paged in and paged out to have clever page stealing), when lru_file_repage = 0 VMM steals non-computational pages.

Paging activity is made of page in i.e. loading a new page in memory (one that could have been paged out) or creating a new space for a process data and page out i.e. removing from physical memory a page to file system or paging space (swap space).

You will also found references of maxclient% which IBM sometimes call non-computational client pages or JFS2/NFS file cache. This parameter is suppose to define a subset of maxperm% and is a strict limit (strict_maxclient = 1 by default on AIX 5.3). You have also a numclient% value with vmstat -v. Practically you set maxclient% to same value as maxperm and if you have 100% JFS2 filesystems, numclient% will be equal to numperm%.

So for databases, which use working storage, and to maximize performance, you want AIX to release the type of memory not needed by the application i.e. file cache memory first. To do so in the past, you would put a strict limit on JFS2 file cache memory using maxclient. It is not unusual to see database servers with file cache limited to 20% of total memory. However, a strict limit is not always desirable. So the new lru_file_repage tunable parameter guarantees working storage stays in memory, and allows file cache to grow.

Remark:
IBM stating that a customer reported this approach can cause excessively high paging when there are multiple, unbalanced memory pools. In this case, IBM Support suggested disabling memory affinity (vmo -r -o memory_affinity=0). Double check with IBM Support and reference this tip.

Use the following to display tunable parameters:

server1{root}# vmo -a

or

server1{root}# vmo -L

maxclient%

Recommended value is 90 (default is 80). This parameter control the maximum percentage of memory than can be used for JFS2/NFS file cache.

maxperm%

Recommended value is 90 (default is 80). This parameter control the maximum percentage of memory than can be used for JFS file cache.

minperm%

Recommended value is (default is 20):

  • 5 if 32G of memory or less.
  • 10 if greater than 32G and less than 64G of memory.
  • 20 if greater than 64G of memory.

lru_file_repage

The AIX lru_file_repage kernel tuning parameter was introduced in AIX V5.2 ML5 and AIX V5.3 ML2. The default setting for lru_file_repage is 1, recommended value is 0. Before lru_file_repage was introduced, it was often necessary to lower maxperm/maxclient (JFS/JFS2) to get good performance, particularly when running an Oracle RDBMS workload. It can be set using the vmo command or via smit tuning (then memory). Default value in AIX 6.1 is 0.

page_steal_method

Activate or not the AIX LRU scanning method. If page_steal_method = 0 then working storage and file pages mixed in one list and lrud scans sequentially to find pages of the right type, if page_steal_method = 1 then there are two lists for a page pool, one for working storage and another for file pages.

Recommended value is 1 i.e. use a list-based scanning method (default is 0 for AIX 5.3 and lower i.e. use the legacy page frame table method). Again default value of AIX 5.3 is 0 and 1 is default value in AIX 6.1, I again rate it as a mistake and I do not understand why the list-based LRU feature is disabled in AIX 5.3.

Memory monitoring

Common memory tuning mistakes trigger high paging (high pgspin and pgspout in topas), high system CPU time, the lrud kernel thread using CPU, slow overall system throughput, slow backups and slow process startup.

vmstat provides information on non-computational pages:

server1{root}# vmstat -v
             12582912 memory pages # Physical memory size in 4KB blocks
             10066417 lruable pages
                10417 free pages
                    2 memory pools
              3226749 pinned pages
                 80.0 maxpin percentage
                 20.0 minperm percentage # minperm%
                 80.0 maxperm percentage # numperm%
                 55.2 numperm percentage # numperm%
              5563158 file pages # Number of non-computational pages
                  0.0 compressed percentage
                    0 compressed pages
                 55.2 numclient percentage # numclient%
                 80.0 maxclient percentage # maxclient%
              5563158 client pages # Number of client non-computational pages
                    0 remote pageouts scheduled
                 2223 pending disk I/Os blocked with no pbuf
                    0 paging space I/Os blocked with no psbuf
                 2228 filesystem I/Os blocked with no fsbuf
                    0 client filesystem I/Os blocked with no fsbuf
                12480 external pager filesystem I/Os blocked with no fsbuf
                    0 Virtualized Partition Memory Page Faults
                 0.00 Time resolving virtualized partition memory page faults

55.2% of physical RAM is used by non-computational pages i.e. 5563158 * 4KB = 21731MB.

svmon provides information on computation pages:

server1{root}# svmon -G
               size       inuse        free         pin     virtual
memory     12582912    12543471       39441     4168840     6845783
pg space    6291456       14074
 
               work        pers        clnt       other
pin         2787775           0           0      451273
in use      6845783           0     4767896
 
PageSize   PoolSize      inuse       pgsp        pin    virtual
s   4 KB          -   10561007      14074    2186376    5793111
L  16 MB        484        257          0        484        257

Real memory are the 3 following columns:

  • work —working storage
  • pers —persistent storage (Persistent storage pages are non-client pages—that is, JFS pages.)
  • clnt —client storage

For each page size you can focus on:

  • inuse —number of 4K pages mapped into memory
  • pin —number of 4K pages mapped into memory and pinned (pin is a subset of inuse)

So in our case 0 persistent storage page + 4767896 client storage pages = 18625MB of computational pages.

server1{root}# ps gv | grep -e RSS -e ora | head -n 10
     PID    TTY STAT  TIME PGIN  SIZE   RSS   LIM  TSIZ   TRS %CPU %MEM COMMAND
  122940      - A     0:15   90  3560 43660 32768 88687 40100  0.0  0.0 ora_psp
  188644      - A     3:22   21  3904 45856 32768 88687 41952  0.0  0.0 ora_mmn
  249968      - A     0:02    0  3632 42932 32768 88687 39300  0.0  0.0 ora_q00
  262246      - A     0:15   71  3556 42856 32768 88687 39300  0.0  0.0 ora_psp
  274602      - A     0:00   13  4336 46184 32768 88687 41848  0.0  0.0 ora_rec
  286782      - A     1:24  963  7160 49008 32768 88687 41848  0.0  0.0 ora_mmo
  299116      - A     1:36 10509  8652 50500 32768 88687 41848  0.0  0.0 oraclei
  352302      - A     0:30 1344  5500 45600 32768 88687 40100  0.0  0.0 ora_smo
  372774      - A     0:00   12  4336 135132 32768 88687 130796  0.0  0.0 ora_rec

Process 122940 is consuming 43660KB of memory.

server1{root}# vmstat -l
 
System configuration: lcpu=6 mem=49152MB
 
kthr    memory              page              faults        cpu     large-page
----- ----------- ------------------------ ------------ ----------- -----------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa   alp   flp
 1  1 6822254  7052   0   0   0  99  291   0 150 5753 638  4  2 95  0   257   227

LOCK_SGA

It would be insane to size your Oracle SGA and having it falling down on disk (with the obvious I/O needed to bring it back in memory, pin it), AIX allows you to pin the Oracle SGA in physical memory i.e. shared memory segment can be pin in physical memory. When doing this you must understand that your system may crash or hang if running out of memory, that’s why some IBM white papers better recommend to set lru_file_repage = 0 (see above) instead of pinning SGA in physical memory.

Theoretically SGA should stay in physical RAM and following tuning exercise should be done only after having done the VMM tuning as it can have more negative impact than positive ones.

At Oracle level you just need to set lock_sga=true, at AIX level check the value of following kernel parameters:

server1{root}# vmo -L maxpin% -L v_pinshm
NAME                      CUR    DEF    BOOT   MIN    MAX    UNIT           TYPE
     DEPENDENCIES
--------------------------------------------------------------------------------
maxpin%                   80     80     80     1      99     % memory          D
     pinnable_frames
     memory_frames
--------------------------------------------------------------------------------
v_pinshm                  0      0      0      0      1      boolean           D
--------------------------------------------------------------------------------

v_pinshm must be equal to 1. If you configure pin memory you must take into consideration the OS itself that could also pin memory for its own use. Suggested value for maxpin% = SUM(all Oracle SGA) + 10% of physical memory. maxpin% is a percentage of physical memory, so if your computation is greater than 80% of physical memory then increase maxpin% unless let it to its default value.

server1{root}# vmo –p –o v_pinshm=1
server1{root}# vmo –p –o maxpin%=80
server1{root}# vmo -a | grep -e maxpin% -e v_pinshm
               maxpin% = 80
              v_pinshm = 1

Remark:
If you don’t set those value you get: ORA-27126: unable to lock shared memory segment in core

Large page memory

Virtual memory is split in pages (default page size is 4KB), with POWER4 processor and AIX 5.1 IBM implemented 16MB page size and with POWER5 process and AIX 5.3 they implemented 64KB and 16GB page size, this is summarized in following table you can find on IBM web site:

Page SizeRequired HardwareUser configuration neededRestrictedKernel
4KB (s i.e.small or standard)ALLNoNo64 & 32
64KB (m i.e. medium)IBM POWER5+™ or laterNoNo64 only
16MB (L i.e. Large)POWER4™ or laterYesYes64 & 32
16GB (H i.e. Huge)POWER5+ or laterYesYes64 only

To know of what is capable your hardware/OS use:

server1{root}# pagesize -af
4K
16M
server1{root}# pagesize -af
4K
64K
16M
16G

4KB and 64KB page size (when available) are used by default and, for example, OLTP transaction and Java can benefit up to 13% of performance improvement when using 64KB page size versus 4KB page size.

Why playing with bigger page size ? Larger page size reduce translation between application memory address and OS memory address, this mapping information is stored in CPU cache for better performance and is called a translation lookaside buffer (TLB). As this TLB is a list with limited entries, bigger memory page result in less TLB entries (with less miss) and as more information fit in one page the pre-fetching is also improved. IBM is stating that Oracle result in better performance when SGA is allocated in 16MB page size. Before implementing it yous should measure the performance of few jobs to make sure it will not decrease performance (observed in some situations). To achieve this you must configure it at OS level and provide to Oracle unix account the needed privileges:

Remark:
hpmcount command can be used to determine the TLB miss rate.

First calculate the number of 16MB page you need with: SUM(ROUND(SGA/16MB)+1) and then use:

server1{root}# vmo -p -o lgpg_regions=484 -o lgpg_size=16777216
Setting lgpg_size to 16777216 in nextboot file
Setting lgpg_regions to 484 in nextboot file
Warning: although the change concerns a dynamic tunable, bosboot should be run to optimally restore these settings at reboot
Setting lgpg_size to 16777216
Setting lgpg_regions to 484
server1{root}# chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE ora320
server1{root}# lsuser -a capabilities ora320
ora320 capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE

To monitor its usage use:

server1{root}# vmstat -P all
 
System configuration: mem=49152MB
 
pgsz            memory                           page
----- -------------------------- ------------------------------------
           siz      avm      fre    re    pi    po    fr     sr    cy
   4K 10600448  5230833     1924     0     0     0    56    151     0
  16M      484      257      227     0     0     0     0      0     0
server1{root}# svmon -G
               size       inuse        free         pin     virtual
memory     12582912    12571202       11710     4067722     6284817
pg space    6291456       12896
 
               work        pers        clnt       other
pin         2686657           0           0      451273
in use      6284817           0     5356593
 
PageSize   PoolSize      inuse       pgsp        pin    virtual
s   4 KB          -   10588738      12896    2085258    5232145
L  16 MB        484        257          0        484        257

Remark:
There is a regression with Oracle 10.2.0.4 (Bug 7226548) and even if everything is well configured 16MB page size will not be used. Apply path 7226548 on top of 10.2.0.4 to solve it.

Asynchronous I/O

On the contrary of HPUX, AIX can have asynchronous for filesystems (JFS, JFS2 and GPFS (General Parallel File System)). Starting with AIX 5.2 Legacy and POSIX asynchronous I/O subsystems are supported, they have different implementation but same performance. Oracle database is supporting the original I/O subsystem i.e. the Legacy one. To check it has been activated at OS level, and so used at Oracle level because by default FILESYSTEMIO_OPTIONS is equal to asynch:

server1{root}# lsdev -C -caio
aio0 Available  Asynchronous I/O (Legacy)

To configure it, if not yet done, use the following:

smit aio

Remark:
If you are using raw devices and if the asynchronous I/O fast path is enabled (default value) then the tuning of asynchronous I/O is not needed.

Asynchronous I/O requests are handle by asynchronous I/O servers therefore their number limit, or not, the number of concurrent I/O operations. Initial number of servers is controlled by minservers parameter (default value us 1) and maximum number is controlled by maxservers parameter (default value is 10). Starting with AIX 5.2 these numbers are per processor.

If you don’t set them correctly (default value are not optimal for Oracle databases) you should see a database writer trace file in your Oracle dump directory containing something like:

Warning: lio_listio returned EAGAIN
Performance degradation may be seen.
WARNING:Oracle process running out of OS kernel I/O resources (1)

Suggested values (JFS & JFS2):

minserversmaxservers
AIX V5.1 and older1010 * number of logical disks
AIX V5.2 and newer1010 * number of logical disks / number of processors

Number of logical disks can be obtained with:

server1{root}# lspv | wc -l
      75

To set them using either smit or (do not set them too high unless you will have useless memory footprint and CPU usage):

server1{root}# chdev -P -l aio0 -a maxservers=’m’ -a minservers=’n’

Remark:
For GPFS tuning values are different.

To check current setting (you can also use smit):

server1{root}# lsattr -El aio0
autoconfig available STATE to be configured at system restart True
fastpath   enable    State of fast path                       True
kprocprio  39        Server PRIORITY                          True
maxreqs    4096      Maximum number of REQUESTS               True
maxservers 20        MAXIMUM number of servers per cpu        True
minservers 10        MINIMUM number of servers                True

To monitor their usage use (refer to man of iostat for description of columns):

server1{root}# iostat -A -Q
 
System configuration: lcpu=6
 
aio: avgc avfc maxg maif maxr avg-cpu: % user % sys % idle % iowait
        0    0   12    0 4096             2.8   1.3   95.7      0.2
stat: A file, file system or message queue is no longer available.

Last parameter to check is maxreqs that control queue length of asynchronous I/O requests. If this parameter is too small (suggested initial value is 4096) you will also have the same Oracle warning in dump directory so the difficulty to well size number of asynchronous I/O servers and the queue length…

Concurrent I/O

For JFS2 concurrent I/O (Direct I/O for JFS) it is the exact same story as VxFS on HPUX (concurrent I/O is direct I/O without the inode lock contention). All is based on the fact that Oracle knows better the context of the blocks and is more clever to know what should be kept or not. So even if read ahead, write behind and I/O buffering is nice at filesystem level, better to allocate the memory to Oracle.

By experience you must be caution and see how much memory you can use for Oracle. For cio option (dio for JFS) of JFS2 we have to be caution because we had few drawback when playing with its equivalent on HPUX, but anyway you should try it (not for ORACLE_HOME). For example the scp/rcp commands used when doing transport Tablespace see its performance killed when target filesystem is mount in direct I/O. Small QA servers with few memory available benefit from OS file caching.

When concurrent or direct I/O is used asynchronous I/O are de-activated (OS file caching is also unused when doing raw devices).

To see what option are used:

mount

LVM always use a pbuf to process every I/O request. A pbuf is a “pinned memory buffer”. LVM extent the pbuf pool when you add a PV to a VG. For AIX release 5.2 and older the pbuf pool is system-wide. Starting with AIX 5.3 this pool is per VG and you now have the new tool lvmo to display statistics per VG. However this new command is only for VG dedicated LVM pbuf tunable parameters i.e. ioo command must be used for system-wide pbuf tunable parameters and so vmstat -v still show the system-wide number of I/Os block due to too small pbuf pool (like in previous AIX release).

To check value of pbuf pool:

server1{root}# ioo -a | grep pv_min_pbuf
                   pv_min_pbuf = 512
server1{root}# lvmo -a -v vg220d
vgname = vg220d
pv_pbuf_count = 512
total_vg_pbufs = 3072
max_vg_pbuf_count = 65536
pervg_blocked_io_count = 0
pv_min_pbuf = 512
global_blocked_io_count = 1352
server1{root}# lvmo -a -v vg220c
vgname = vg220c
pv_pbuf_count = 512
total_vg_pbufs = 2048
max_vg_pbuf_count = 65536
pervg_blocked_io_count = 0
pv_min_pbuf = 512
global_blocked_io_count = 1352

To change it use (take care to monitor performance before changing it):

server1{root}# ioo –p –o pv_min_pbuf=number of pbufs per PV

Monitoring:

server1{root}# vmstat -v | grep "blocked with no pbuf"
                 1352 pending disk I/Os blocked with no pbuf

Remark:
Same value as with lvmo command i.e. 1352.

LTG size is the maximum transfer size for disk I/O, this is a per VG parameter and is lower bounded by the smallest disk transfer size of the VG. Default value should satisfy most workload, monitor performance when planning to change it. Current value can be displayed with:

server1{root}# lsvg vg220d
VOLUME GROUP:       vg220d                   VG IDENTIFIER:  00c9cf0000004c000000011a9aee4f69
VG STATE:           active                   PP SIZE:        64 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      2702 (172928 megabytes)
MAX LVs:            512                      FREE PPs:       0 (0 megabytes)
LVs:                31                       USED PPs:       2702 (172928 megabytes)
OPEN LVs:           31                       QUORUM:         4 (Enabled)
TOTAL PVs:          6                        VG DESCRIPTORS: 6
STALE PVs:          0                        STALE PPs:      0
ACTIVE PVs:         6                        AUTO ON:        yes
MAX PPs per VG:     130048
MAX PPs per PV:     1016                     MAX PVs:        128
LTG size (Dynamic): 256 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:          no                       BB POLICY:      relocatable

Strangely at disk level it does not work on my system:

server1{root}# lsvg -p vg220d
vg220d:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk29           active            539         0           00..00..00..00..00
hdisk30           active            539         0           00..00..00..00..00
hdisk31           active            539         0           00..00..00..00..00
hdisk32           active            539         0           00..00..00..00..00
hdisk33           active            539         0           00..00..00..00..00
hdisk69           active            7           0           00..00..00..00..00
server1{root}# lquerypv -M hdisk29
0
server1{root}# lspv hdisk29
PHYSICAL VOLUME:    hdisk29                  VOLUME GROUP:     vg220d
PV IDENTIFIER:      00c9cf009aee4b05 VG IDENTIFIER     00c9cf0000004c000000011a9aee4f69
PV STATE:           active
STALE PARTITIONS:   0                        ALLOCATABLE:      yes
PP SIZE:            64 megabyte(s)           LOGICAL VOLUMES:  8
TOTAL PPs:          539 (34496 megabytes)    VG DESCRIPTORS:   1
FREE PPs:           0 (0 megabytes)          HOT SPARE:        no
USED PPs:           539 (34496 megabytes)    MAX REQUEST:      ???????
FREE DISTRIBUTION:  00..00..00..00..00
USED DISTRIBUTION:  108..108..107..108..108

Filesystems tuning

LVM volume group should be created in Scalable VG type:

VG typeMaximum PVsMaximum LVsMaximum PPs per VGMaximum PP size
Normal VG3225632,512 (1016 * 32)1 GB
Big VG128 512130,048 (1016 * 128)1 GB
Scalable VG102440962,097,152128 GB

Tablespaces FS should be created using agblksize=4096 and agblksize=512 for redo and maybe control files filesystems. To display filesystems parameters:

server1{root}# lsfs -q /ora320/data
Name            Nodename   Mount Pt               VFS   Size    Options    Auto Accounting
/dev/vgd21dlvol8 --         /ora320/data           jfs2  36306944 rw         yes  no
  (lv size: 36306944, fs size: 36306944, block size: 4096, sparse files: yes, inline log: no, inline log size: 0, EAformat: v1, Quota: no, DMAPI: no, VIX: no)

Inline JFS2 must be avoided, you should have one jfs2log per filesystem and it should ideally resides on a different physical volume than the filesystem. To check if you are using inline log:

server1{root}# lsfs -q /ora320/work_data
Name            Nodename   Mount Pt               VFS   Size    Options    Auto Accounting
/dev/vgd21dlvol37 --         /ora320/work_data      jfs2  12320768 rw         yes  no
  (lv size: 12320768, fs size: 12320768, block size: 4096, sparse files: yes, inline log: no, inline log size: 0, EAformat: v1, Quota: no, DMAPI: no, VIX: no)
server1{root}# lsvg -l vg220d
vg220d:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
vgd21dlvol14        jfs2       1       1       1    open/syncd    /ora320/ctrl
vgd21dlvol7         jfs2       328     328     1    open/syncd    /ora320/arch
vgd21dlvol12        jfs2       130     130     1    open/syncd    /ora320/data_da
vgd21dlvol57        jfs2       92      92      1    open/syncd    /ora320/data_ma
vgd21dlvol9         jfs2       78      78      1    open/syncd    /ora320/data_mj
vgd21dlvol11        jfs2       130     130     1    open/syncd    /ora320/data_dj
vgd21dlvol30        jfs2       125     125     1    open/syncd    /ora320/data_rdj
vgd21dlvol23        jfs2       93      93      3    open/syncd    /ora320/index_da
vgd21dlvol8         jfs2       277     277     1    open/syncd    /ora320/data
vgd21dlvol10        jfs2       94      94      2    open/syncd    /ora320/data_wj
vgd21dlvol13        jfs2       5       5       1    open/syncd    /ora320/dump
vgd21dlvol16        jfs2       146     146     1    open/syncd    /ora320/index
vgd21dlvol21        jfs2       74      74      1    open/syncd    /ora320/index_dj
vgd21dlvol58        jfs2       61      61      1    open/syncd    /ora320/index_ma
vgd21dlvol17        jfs2       63      63      2    open/syncd    /ora320/index_mj
vgd21dlvol19        jfs2       80      80      3    open/syncd    /ora320/index_wj
vgd21dlvol25        jfs2       6       6       1    open/syncd    /ora320/log
vgd21dlvol28        jfs2       96      96      2    open/syncd    /ora320/rbs
vgd21dlvol15        jfs2       121     121     1    open/syncd    /ora320/software
vgd21dlvol26        jfs2       18      18      1    open/syncd    /ora320/sys
vgd21dlvol27        jfs2       47      47      1    open/syncd    /ora320/temp
vgd21dlvol24        jfs2       73      73      1    open/syncd    /ora320/temp_da
vgd21dlvol31        jfs2       64      64      1    open/syncd    /ora320/index_rdj
vgd21dlvol22        jfs2       81      81      2    open/syncd    /ora320/temp_dj
vgd21dlvol37        jfs2       94      94      1    open/syncd    /ora320/work_data
vgd21dlvol20        jfs2       31      31      1    open/syncd    /ora320/temp_wj
vgd21dlvol18        jfs2       21      21      1    open/syncd    /ora320/temp_mj
loglv05             jfs2log    1       1       1    open/syncd    N/A
vgd21dlvol32        jfs2       96      96      2    open/syncd    /ora320/index_r4da
vgd21dlvol33        jfs2       160     160     3    open/syncd    /ora320/data_r4da
vgd21dlvol34        jfs2       16      16      1    open/syncd    /ora320/temp_r4da

Then to associate filesystems and jfs2log check /etc/filesystems file.

There are also few JFS2 parameters that you can change (ioo to display and set them):

server1{root}# ioo -a
         j2_atimeUpdateSymlink = 0
 j2_dynamicBufferPreallocation = 16
             j2_inodeCacheSize = 400
           j2_maxPageReadAhead = 128
             j2_maxRandomWrite = 0
       j2_maxUsableMaxTransfer = 512
          j2_metadataCacheSize = 400
           j2_minPageReadAhead = 2
      j2_nBufferPerPagerDevice = 512
j2_nPagesPerWriteBehindCluster = 32
             j2_nRandomCluster = 0
      j2_nonFatalCrashesSystem = 0
         j2_syncModifiedMapped = 1
              j2_syncPageCount = 0
              j2_syncPageLimit = 256
       j2_syncdLogSyncInterval = 1
                 j2_unmarkComp = 0
            jfs_clread_enabled = 0
             jfs_use_read_lock = 1
                    lvm_bufcnt = 9
                    maxpgahead = 8
                    maxrandwrt = 0
                 memory_frames = 12582912
                    minpgahead = 2
                      numclust = 1
                     numfsbufs = 196
                     pd_npages = 65536
            pgahd_scale_thresh = 0
                   pv_min_pbuf = 512
            sync_release_ilock = 0

VMM offer the sequential read ahead capability i.e. reading in-advance block of a file you access sequentially. you can set the minimum number of page read when VMM detect a sequential access with j2_minPageReadAhead and the maximum number of pages with j2_maxPageReadAhead. Respective IBM suggested value are Max(2,db_block_size/4096) and Max(256,db_block_size/4096*db_file_multiblock_read_count).

I/O monitoring

server1{root}# vmstat -I 1 2
 
System configuration: lcpu=6 mem=49152MB
 
  kthr     memory              page              faults        cpu
-------- ----------- ------------------------ ------------ -----------
 r  b  p   avm   fre  fi  fo  pi  po  fr   sr  in   sy  cs us sy id wa
 0  0  0 4705720 1643943   0   8   0   0   0    0   7 2987 404  0  2 98  0
 0  0  0 4705720 1643943   0  16   0   0   0    0   9  253 373  0  0 99  0
server1{root}# iostat 1 2
 
System configuration: lcpu=6 drives=75 paths=4 vdisks=0
 
tty:      tin         tout    avg-cpu: % user % sys % idle % iowait
        124.0        644.0                0.7   0.7   98.6      0.0
 
Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
hdisk1           0.0      20.0       5.0          0        20
hdisk0           0.0      20.0       5.0          0        20
hdisk28          0.0       0.0       0.0          0         0
hdisk30          0.0       0.0       0.0          0         0
hdisk31          0.0       0.0       0.0          0         0
hdisk29          0.0       0.0       0.0          0         0
.
.
server1{root}# sar -P ALL 1 2
 
AIX server1 3 5 0035208A4C00    06/09/10
 
System configuration: lcpu=6  mode=Capped
 
17:21:28 cpu    %usr    %sys    %wio   %idle
17:21:29  0        0       0       0     100
          1        0       0       0     100
          2        1       0       0      99
          3        0       1       0      99
          4        0       0       0     100
          5        0       0       0      99
          -        0       0       0      99
17:21:30  0        0       0       0     100
          1        0       0       0     100
          2        0       0       0     100
          3        0       0       0     100
          4        0       0       0      99
          5        0       0       0      99
          -        0       0       0     100
 
Average   0        0       0       0     100
          1        0       0       0     100
          2        0       0       0     100
          3        0       1       0      99
          4        0       1       0      98
          5        0       0       0     100
          -        0       0       0      99

Another comand to mention is lvmstat:

server1{root}# lvmstat -l vgd21dlvol37 -e
server1{root}# lvmstat -l vgd21dlvol37 2
.
.
server1{root}# lvmstat -l vgd21dlvol37 -d

References

  • Minimum Software Versions and Patches Required to Support Oracle Products on IBM Power Systems Doc ID: 282036.1
  • IBM developerWorks: Wikis – Systems – AIX
  • AIX: Database performance gets slower the longer the database is running [ID 316533.1]
  • How to enable Large Page Feature on AIX-Based Systems [ID 372157.1]
  • ORA-27126 during database startup after 10.2.0.4 patchset software install on AIX5L [ID 603489.1]
  • Bug 7226548: 10.2.0.4 NOT USING LARGE PAGES
  • Guide to Multiple Page Size Support on AIX 5L Version 5.3
  • Oracle not using large pages on AIX [ID 848632.1]
  • AIX: DBWR trace warning: LIO_LISTIO RETURNED EAGAIN [ID 265491.1]
  • AIX: Determining Oracle Memory Usage On AIX [ID 123754.1]
  • Overview of AIX page replacement

About Post Author

Share the knowledge!
This entry was posted in AIX, Oracle and tagged . Bookmark the permalink.

12 thoughts on “AIX Oracle tuning

  1. Thanks, very interested. As now it is Dec, 2017. I haven’t found information regarding to tune Oracle 11 with AIX 7.1. Many of the parameters link lru_file_repage, seems that aren’t available in this version. Could you please tell us if there’s updated information?

    Thanks.

    • Thanks for nice comment ! Anyways as you have seen the document start to be really old and must be taken only as a base for further investigations with latest parameters both from Oracle and AIX. Unfortunately I have never updated it because we now, as many people around, do Oracle mainly on Linux. So our most important databases are 100% running on Linux but who knows in case of issue on our few AIX boxes I might take a bit of time to investigate and will then share…

  2. David Browne says:

    We have been trying to work out why our paging space usage kept creeping up and up and after 3-6 months the server needed a reboot. The reason is that we use ‘SHUTDOWN ABORT’ when shutting down our EBS (E-Business) database which has natively compiled code. This has the effect of leaving stale native code in memory. This stale code is not used for anything, it’s just left there and builds up and up over time, eating into the paging space.
    On AIX, monitor memory usage:

    ipcs -ar |grep -e JOXSHM_EXT -e PESHM -e PESLD | awk ‘{cnt+=1; sum+=$10} END {print “Count=”,cnt,”Sum=”,sum,”Average=”,sum/cnt}’

    remove them by:

    ipcs -ar |grep -e JOXSHM_EXT -e PESHM -e PESLD | awk ‘ {name = $17; cmd = “ipcrm -r -m ” name; print “Run “, cmd; system( cmd )}’

    or
    (if you are on 11.2.0.3 – apply patch 13574534)

    11.2.0.4 and greater

    set _ncomp_shared_objects_dir in your init.ora to a temporary directory. The native code will then be stored in this directory which can be deleted only after the database is shutdown. If you use this parameter, the natively compiled code will no longer be stored in memory and there will be no further build up of stale native code in memory.

    http://ksun-oracle.blogspot.co.uk/2013/11/remove-stale-native-code-files-on-aix.html

  3. Hello,
    The post mentions the hpmcount utility as a way to monitor TLB misses on IBM Power platform. The hpmcount utility can be used as a wrapper for the command to be monitored. I’ve found hpmstat to be more useful for evaluating Oracle, since it will provide system wide metrics. Requires root privileges.
    http://sql-sasquatch.blogspot.com/2013/04/aix-hpmstat-utility-on-power-7.html
    http://pic.dhe.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.cmds/doc/aixcmds2/hpmstat.htm

  4. Hii Yannick

    Thanks for your explanation. yes indeed i know that the IO is chuck size is restricted to LTG boundary and its default to 256k on hp-ux when using LVM as defined in the header files.
    Well and if fail to agree with me for the max IO size in hp-ux, then can you please let me know the reason as why oracle parameter DB_FILE_MULTIBLOCK_READ_COUNT on hp-ux defaults to 128 with a 8K database and its sets to 64 for a 16k database ?

      • Hello Yannick

        Thanks for the Oracle Note. Here Oracle says its 1M for maximum platforms. What i found out is for hp-ux the max IO transfer unit is 1M for asynchrounous io but when using LVM its 256K. So i suppose Oracle by default sets the parameter db_file_multiblock_read_count as per the DB blocksize and assuming asynchronous io mode.

  5. Discussing Oracle & IO — In my environment we have has the setting of DB_FILE_MULTIBLOCK_READ_COUNT(128) * Oracle block size (8K)= 1Mb ( which is the max IO chunk, that can be handled by vxfs software). The Filesystem block size is also 8k so as not to have more IO request per oracle block.

    Generally, when LVM receives a request for an I/O, it breaks the I/O down into what is called logical track group (LTG) sizes before it passes the request down to the device driver of the underlying disks. The LTG is the maximum transfer size of an LV and is common to all the LVs in the VG. Large disk I/O requests made by applications were broken up by the LVM on LTG alignments.

    What makes me wonder is – We are using buffered IO on vxfs filesystem on HP-UX and oracle 10g. So whatever be the size of oracle blocks the OS kernel or disk drivers are supposed to build batches of optimum i/o requests before placing it to the storage, isnt it ?
    We also use the vxfs mount option mincache=direct & convosync=direct but still i think we simulate direct i/o which means we only bypass the OS buffers ( ie data transferred directly from disk to user buffer) but that doesnot account for the IO request size, what do you think ?

    • Hi Lodh,

      LVM I/O requets size is equal to LTG and seems to be 256KB on HPUX while it’s a tunable (dynamic) parameter on AIX.

      According to HPUX documentation on VxFS 4.1, default maximum I/O size (tunable parameter) is 256KB which seems to be inline with Maximum disk driver I/O size:


      # uname -a
      HP-UX localhost B.11.31 U ia64 2829901614 unlimited-user license
      # grep -i MAXPHYS /usr/include/machine/sys/param.h
      #define MAXPHYS (256 * 1024) /* Maximum size of physical I/O transfer */

      Even if easy to find on Google:

      • The maximum buffered I/O size is 64KB.
      • Maximum direct I/O size is 256KB.

      Then for your question if disk driver is building optimum I/O requests (not sure to really understand what you mean.. NCQ ?) then I don’t think so as the relation between physical disk/disk array cache/OS cache/LVM disk/VxFS looks quite complex to handle… Moreover with an Oracle database workload you cannot let an I/O pending waiting for an optimal window to achieve it isn’t it ?

      Then for your direct I/O story, see above VxFS maximum drect I/O size is 256KB adn is tunable with VxFS > 4.1.

      Yannick.

  6. Ralf Schmidt-Dannert says:

    I strongly recommend to review the latest published IBM white paper titled:
    Oracle Architecture and Tuning on AIX v2.0

    It is available here and reflects the current best tuning practices as applied by IBM Advanced Technical Skills team and includes AIX 5.3 and AIX6.1 recommendations / changes which are not reflected in the above article:
    http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP100883

    Metalink Note 316533.1 is very questionable as it does not take into account or even mention the vmo tuning variable “lru_file_repage” which is available I believe since AIX 5.3 TL03. See the above white paper for current best practices!

    • Philip Verdieck says:

      Thank you very much for that IBM whitepaper link. We are migrating from DB2 to Oracle on AIX, we operate an ERP system (CGI AMS), and I am just starting to try and tune this thing.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>