Table of contents
Preamble
By default, AIX is tuned for a mixed workload. AIX Virtual Memory Manager (AIX VMM) will grow its VMM file cache up to 80% of physical RAM (AIX 6.1 default parameters are much in line with an Oracle database than the default one of AIX 5.3). This default configuration answer well to applications like NFS server or web server, but not very well to applications which do their own cache management. As you guess this includes most databases flavor (Oracle, MySQL with InnoDB, DB2, PostgreSQL).
As written in any white papers best solution is to use raw logical volumes, raw devices or more simply Oracle ASM but it requires more skilled people to manage it. If you do not accept to loose the flexibility of filesystems then consider using direct I/O (JFS) or concurrent I/O (JFS2) (with drawbacks listed below) and/or tune the AIX file cache to be less aggressive.
Environment variables
On top of the classic AIX environment variables:
export ORACLE_BASE=/ora320 export ORACLE_HOME=$ORACLE_BASE/software export ORACLE_SID=db1 export TNS_ADMIN=$ORACLE_HOME/network/admin export LIBPATH=$ORACLE_HOME/lib32:$ORACLE_HOME/lib export PATH=$PATH:$ORACLE_HOME/bin:$ORACLE_HOME/OPatch alias sql="sqlplus "/ as sysdba "" |
Do not forget to add this important environment variable, it is clearly described in Oracle official documentation but it’s very easy to miss it:
export AIXTHREAD_SCOPE=S |
From Oracle official documentation:
- Threads in AIX can run with process-wide contention scope (M:N or P mode, default value) or with systemwide contention scope (1:1 or S mode). The AIXTHREAD_SCOPE environment variable controls which contention scope is used.
- When using process-wide contention scope, Oracle threads are mapped to a pool of kernel threads. When Oracle is waiting on an event and its thread is swapped out, it may return on a different kernel thread with a different thread ID. Oracle uses the thread ID to post waiting processes, so it is important for the thread ID to remain the same.
- When using systemwide contention scope, Oracle threads are mapped to kernel threads statically, one to one.
Oracle applications are primarily multiprocess and single-threaded that’s why Oracle recommends that you use systemwide contention (S mode). It would have been more logic to say than having a pool of kernel threads to serve all user threads that are coming is more efficient (mutualizing resources in a way) but apparently this is true only for multi-threaded application like Java for example. But what about the Oracle parallel processes ? Are they forked from Query Coordinator ? I suppose not, looking at the Oracle suggestion…
Remark:
On AIX 5L version 5.2 or later, systemwide contention scope (S mode) has also the added value of reducing allocated memory of each Oracle process.
Maximum number of processes
There is only one kernel parameter to set on AIX: the maximum number of processes per Unix account. If your number of parallel server is big and if you have multiple connection through the listener the default value of maxuproc is probably too low.
If it is too low you may also hit the ORA-04030: out of process memory when trying to allocate 127000 bytes (QERHJ hash-joi,kllcqas:kllsltba) error (see Oracle Support Note:Client Connection Fails with ORA-04030 (QERHJ hash-joi,kllcqas:kllsltba) [ID 470604.1].
To display the maxuproc value:
server1{root}# lsattr -E -l sys0 -a maxuproc maxuproc 128 Maximum number of PROCESSES allowed per user True |
To change it:
server1{root}# chdev -l sys0 -a maxuproc=8192 sys0 changed server1{root}# lsattr -E -l sys0 -a maxuproc maxuproc 8192 Maximum number of PROCESSES allowed per user True |
VMM tuning
VMM background
As on all Unix flavor, AIX has its own Virtual Memory Manager (VMM) that is a mix of disk and physical memory. In AIX, all virtual memory segments (bigger than real physical memory) are partitioned into pages (4KB size by default). Allocated pages can be either physical memory (RAM) or paging space (virtual memory stored on disk). To display available swap and its usage use:
server1{root}# swap -l device maj,min total free /dev/hd6 10, 2 24576MB 24544MB server1{root}# lsps -a Page Space Physical Volume Volume Group Size %Used Active Auto Type hd6 hdisk1 rootvg 24576MB 1 yes yes lv server1{root}# lsps -s Total Paging Space Percent Used 24576MB 1% |
AIX classifies memory pages into two categories:
- Working storage pages (process data, shared memory, …) also called computational pages.
- Permanent storage pages (files cached in memory: client pages and OS file system cache) also called non-computational pages.
After a running period AIX will need to free memory, either file cache or working storage. To achieve this VMM is using a page replacement algorithm through lrud process (ps -k). Two 2 basic tunable parameter control this algorithm: minperm% and maxperm%, they act like pctfree and pctused in an Oracle block and control how VMM handle non-computational pages.
maxperm% is the upper limit (non-strict i.e. VMM can decide to go above this limit, making it strict can be done by setting strict_maxperm = 1 but it is not advisable) of non-computational pages in memory. minperm% set the lower limit on non-computational pages in memory.
numperm% is the percentage of non-computational pages in memory (can be obtained with vmstat -v, see memory monitoring chapter).
When numperm% is above maxperm% VMM steals non-computational pages and when numperm% is below minperm% VMM steals computational and non-computational pages.
When numperm% is between numperm% and maxperm%, lru_file_repage tunable parameter determine how VMM steals page. When lru_file_repage = 1 VMM steals either non-computational pages or non-computational and computational pages based on paging activity (VMM maintain a list of pages that are often paged in and paged out to have clever page stealing), when lru_file_repage = 0 VMM steals non-computational pages.
Paging activity is made of page in i.e. loading a new page in memory (one that could have been paged out) or creating a new space for a process data and page out i.e. removing from physical memory a page to file system or paging space (swap space).
You will also found references of maxclient% which IBM sometimes call non-computational client pages or JFS2/NFS file cache. This parameter is suppose to define a subset of maxperm% and is a strict limit (strict_maxclient = 1 by default on AIX 5.3). You have also a numclient% value with vmstat -v. Practically you set maxclient% to same value as maxperm and if you have 100% JFS2 filesystems, numclient% will be equal to numperm%.
So for databases, which use working storage, and to maximize performance, you want AIX to release the type of memory not needed by the application i.e. file cache memory first. To do so in the past, you would put a strict limit on JFS2 file cache memory using maxclient. It is not unusual to see database servers with file cache limited to 20% of total memory. However, a strict limit is not always desirable. So the new lru_file_repage tunable parameter guarantees working storage stays in memory, and allows file cache to grow.
Remark:
IBM stating that a customer reported this approach can cause excessively high paging when there are multiple, unbalanced memory pools. In this case, IBM Support suggested disabling memory affinity (vmo -r -o memory_affinity=0). Double check with IBM Support and reference this tip.
Use the following to display tunable parameters:
server1{root}# vmo -a |
or
server1{root}# vmo -L |
maxclient%
Recommended value is 90 (default is 80). This parameter control the maximum percentage of memory than can be used for JFS2/NFS file cache.
maxperm%
Recommended value is 90 (default is 80). This parameter control the maximum percentage of memory than can be used for JFS file cache.
minperm%
Recommended value is (default is 20):
- 5 if 32G of memory or less.
- 10 if greater than 32G and less than 64G of memory.
- 20 if greater than 64G of memory.
lru_file_repage
The AIX lru_file_repage kernel tuning parameter was introduced in AIX V5.2 ML5 and AIX V5.3 ML2. The default setting for lru_file_repage is 1, recommended value is 0. Before lru_file_repage was introduced, it was often necessary to lower maxperm/maxclient (JFS/JFS2) to get good performance, particularly when running an Oracle RDBMS workload. It can be set using the vmo command or via smit tuning (then memory). Default value in AIX 6.1 is 0.
page_steal_method
Activate or not the AIX LRU scanning method. If page_steal_method = 0 then working storage and file pages mixed in one list and lrud scans sequentially to find pages of the right type, if page_steal_method = 1 then there are two lists for a page pool, one for working storage and another for file pages.
Recommended value is 1 i.e. use a list-based scanning method (default is 0 for AIX 5.3 and lower i.e. use the legacy page frame table method). Again default value of AIX 5.3 is 0 and 1 is default value in AIX 6.1, I again rate it as a mistake and I do not understand why the list-based LRU feature is disabled in AIX 5.3.
Memory monitoring
Common memory tuning mistakes trigger high paging (high pgspin and pgspout in topas), high system CPU time, the lrud kernel thread using CPU, slow overall system throughput, slow backups and slow process startup.
vmstat provides information on non-computational pages:
server1{root}# vmstat -v 12582912 memory pages # Physical memory size in 4KB blocks 10066417 lruable pages 10417 free pages 2 memory pools 3226749 pinned pages 80.0 maxpin percentage 20.0 minperm percentage # minperm% 80.0 maxperm percentage # numperm% 55.2 numperm percentage # numperm% 5563158 file pages # Number of non-computational pages 0.0 compressed percentage 0 compressed pages 55.2 numclient percentage # numclient% 80.0 maxclient percentage # maxclient% 5563158 client pages # Number of client non-computational pages 0 remote pageouts scheduled 2223 pending disk I/Os blocked with no pbuf 0 paging space I/Os blocked with no psbuf 2228 filesystem I/Os blocked with no fsbuf 0 client filesystem I/Os blocked with no fsbuf 12480 external pager filesystem I/Os blocked with no fsbuf 0 Virtualized Partition Memory Page Faults 0.00 Time resolving virtualized partition memory page faults |
55.2% of physical RAM is used by non-computational pages i.e. 5563158 * 4KB = 21731MB.
svmon provides information on computation pages:
server1{root}# svmon -G size inuse free pin virtual memory 12582912 12543471 39441 4168840 6845783 pg space 6291456 14074 work pers clnt other pin 2787775 0 0 451273 in use 6845783 0 4767896 PageSize PoolSize inuse pgsp pin virtual s 4 KB - 10561007 14074 2186376 5793111 L 16 MB 484 257 0 484 257 |
Real memory are the 3 following columns:
- work —working storage
- pers —persistent storage (Persistent storage pages are non-client pages—that is, JFS pages.)
- clnt —client storage
For each page size you can focus on:
- inuse —number of 4K pages mapped into memory
- pin —number of 4K pages mapped into memory and pinned (pin is a subset of inuse)
So in our case 0 persistent storage page + 4767896 client storage pages = 18625MB of computational pages.
server1{root}# ps gv | grep -e RSS -e ora | head -n 10 PID TTY STAT TIME PGIN SIZE RSS LIM TSIZ TRS %CPU %MEM COMMAND 122940 - A 0:15 90 3560 43660 32768 88687 40100 0.0 0.0 ora_psp 188644 - A 3:22 21 3904 45856 32768 88687 41952 0.0 0.0 ora_mmn 249968 - A 0:02 0 3632 42932 32768 88687 39300 0.0 0.0 ora_q00 262246 - A 0:15 71 3556 42856 32768 88687 39300 0.0 0.0 ora_psp 274602 - A 0:00 13 4336 46184 32768 88687 41848 0.0 0.0 ora_rec 286782 - A 1:24 963 7160 49008 32768 88687 41848 0.0 0.0 ora_mmo 299116 - A 1:36 10509 8652 50500 32768 88687 41848 0.0 0.0 oraclei 352302 - A 0:30 1344 5500 45600 32768 88687 40100 0.0 0.0 ora_smo 372774 - A 0:00 12 4336 135132 32768 88687 130796 0.0 0.0 ora_rec |
Process 122940 is consuming 43660KB of memory.
server1{root}# vmstat -l System configuration: lcpu=6 mem=49152MB kthr memory page faults cpu large-page ----- ----------- ------------------------ ------------ ----------- ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa alp flp 1 1 6822254 7052 0 0 0 99 291 0 150 5753 638 4 2 95 0 257 227 |
LOCK_SGA
It would be insane to size your Oracle SGA and having it falling down on disk (with the obvious I/O needed to bring it back in memory, pin it), AIX allows you to pin the Oracle SGA in physical memory i.e. shared memory segment can be pin in physical memory. When doing this you must understand that your system may crash or hang if running out of memory, that’s why some IBM white papers better recommend to set lru_file_repage = 0 (see above) instead of pinning SGA in physical memory.
Theoretically SGA should stay in physical RAM and following tuning exercise should be done only after having done the VMM tuning as it can have more negative impact than positive ones.
At Oracle level you just need to set lock_sga=true, at AIX level check the value of following kernel parameters:
server1{root}# vmo -L maxpin% -L v_pinshm NAME CUR DEF BOOT MIN MAX UNIT TYPE DEPENDENCIES -------------------------------------------------------------------------------- maxpin% 80 80 80 1 99 % memory D pinnable_frames memory_frames -------------------------------------------------------------------------------- v_pinshm 0 0 0 0 1 boolean D -------------------------------------------------------------------------------- |
v_pinshm must be equal to 1. If you configure pin memory you must take into consideration the OS itself that could also pin memory for its own use. Suggested value for maxpin% = SUM(all Oracle SGA) + 10% of physical memory. maxpin% is a percentage of physical memory, so if your computation is greater than 80% of physical memory then increase maxpin% unless let it to its default value.
server1{root}# vmo –p –o v_pinshm=1 server1{root}# vmo –p –o maxpin%=80 server1{root}# vmo -a | grep -e maxpin% -e v_pinshm maxpin% = 80 v_pinshm = 1 |
Remark:
If you don’t set those value you get: ORA-27126: unable to lock shared memory segment in core
Large page memory
Virtual memory is split in pages (default page size is 4KB), with POWER4 processor and AIX 5.1 IBM implemented 16MB page size and with POWER5 process and AIX 5.3 they implemented 64KB and 16GB page size, this is summarized in following table you can find on IBM web site:
Page Size | Required Hardware | User configuration needed | Restricted | Kernel |
---|---|---|---|---|
4KB (s i.e.small or standard) | ALL | No | No | 64 & 32 |
64KB (m i.e. medium) | IBM POWER5+™ or later | No | No | 64 only |
16MB (L i.e. Large) | POWER4™ or later | Yes | Yes | 64 & 32 |
16GB (H i.e. Huge) | POWER5+ or later | Yes | Yes | 64 only |
To know of what is capable your hardware/OS use:
server1{root}# pagesize -af 4K 16M |
server1{root}# pagesize -af 4K 64K 16M 16G |
4KB and 64KB page size (when available) are used by default and, for example, OLTP transaction and Java can benefit up to 13% of performance improvement when using 64KB page size versus 4KB page size.
Why playing with bigger page size ? Larger page size reduce translation between application memory address and OS memory address, this mapping information is stored in CPU cache for better performance and is called a translation lookaside buffer (TLB). As this TLB is a list with limited entries, bigger memory page result in less TLB entries (with less miss) and as more information fit in one page the pre-fetching is also improved. IBM is stating that Oracle result in better performance when SGA is allocated in 16MB page size. Before implementing it yous should measure the performance of few jobs to make sure it will not decrease performance (observed in some situations). To achieve this you must configure it at OS level and provide to Oracle unix account the needed privileges:
Remark:
hpmcount command can be used to determine the TLB miss rate.
First calculate the number of 16MB page you need with: SUM(ROUND(SGA/16MB)+1) and then use:
server1{root}# vmo -p -o lgpg_regions=484 -o lgpg_size=16777216 Setting lgpg_size to 16777216 in nextboot file Setting lgpg_regions to 484 in nextboot file Warning: although the change concerns a dynamic tunable, bosboot should be run to optimally restore these settings at reboot Setting lgpg_size to 16777216 Setting lgpg_regions to 484 |
server1{root}# chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE ora320 server1{root}# lsuser -a capabilities ora320 ora320 capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE |
To monitor its usage use:
server1{root}# vmstat -P all System configuration: mem=49152MB pgsz memory page ----- -------------------------- ------------------------------------ siz avm fre re pi po fr sr cy 4K 10600448 5230833 1924 0 0 0 56 151 0 16M 484 257 227 0 0 0 0 0 0 |
server1{root}# svmon -G size inuse free pin virtual memory 12582912 12571202 11710 4067722 6284817 pg space 6291456 12896 work pers clnt other pin 2686657 0 0 451273 in use 6284817 0 5356593 PageSize PoolSize inuse pgsp pin virtual s 4 KB - 10588738 12896 2085258 5232145 L 16 MB 484 257 0 484 257 |
Remark:
There is a regression with Oracle 10.2.0.4 (Bug 7226548) and even if everything is well configured 16MB page size will not be used. Apply path 7226548 on top of 10.2.0.4 to solve it.
Asynchronous I/O
On the contrary of HPUX, AIX can have asynchronous for filesystems (JFS, JFS2 and GPFS (General Parallel File System)). Starting with AIX 5.2 Legacy and POSIX asynchronous I/O subsystems are supported, they have different implementation but same performance. Oracle database is supporting the original I/O subsystem i.e. the Legacy one. To check it has been activated at OS level, and so used at Oracle level because by default FILESYSTEMIO_OPTIONS is equal to asynch:
server1{root}# lsdev -C -caio aio0 Available Asynchronous I/O (Legacy) |
To configure it, if not yet done, use the following:
smit aio |
Remark:
If you are using raw devices and if the asynchronous I/O fast path is enabled (default value) then the tuning of asynchronous I/O is not needed.
Asynchronous I/O requests are handle by asynchronous I/O servers therefore their number limit, or not, the number of concurrent I/O operations. Initial number of servers is controlled by minservers parameter (default value us 1) and maximum number is controlled by maxservers parameter (default value is 10). Starting with AIX 5.2 these numbers are per processor.
If you don’t set them correctly (default value are not optimal for Oracle databases) you should see a database writer trace file in your Oracle dump directory containing something like:
Warning: lio_listio returned EAGAIN Performance degradation may be seen. WARNING:Oracle process running out of OS kernel I/O resources (1) |
Suggested values (JFS & JFS2):
minservers | maxservers | |
---|---|---|
AIX V5.1 and older | 10 | 10 * number of logical disks |
AIX V5.2 and newer | 10 | 10 * number of logical disks / number of processors |
Number of logical disks can be obtained with:
server1{root}# lspv | wc -l 75 |
To set them using either smit or (do not set them too high unless you will have useless memory footprint and CPU usage):
server1{root}# chdev -P -l aio0 -a maxservers=’m’ -a minservers=’n’ |
Remark:
For GPFS tuning values are different.
To check current setting (you can also use smit):
server1{root}# lsattr -El aio0 autoconfig available STATE to be configured at system restart True fastpath enable State of fast path True kprocprio 39 Server PRIORITY True maxreqs 4096 Maximum number of REQUESTS True maxservers 20 MAXIMUM number of servers per cpu True minservers 10 MINIMUM number of servers True |
To monitor their usage use (refer to man of iostat for description of columns):
server1{root}# iostat -A -Q System configuration: lcpu=6 aio: avgc avfc maxg maif maxr avg-cpu: % user % sys % idle % iowait 0 0 12 0 4096 2.8 1.3 95.7 0.2 stat: A file, file system or message queue is no longer available. |
Last parameter to check is maxreqs that control queue length of asynchronous I/O requests. If this parameter is too small (suggested initial value is 4096) you will also have the same Oracle warning in dump directory so the difficulty to well size number of asynchronous I/O servers and the queue length…
Concurrent I/O
For JFS2 concurrent I/O (Direct I/O for JFS) it is the exact same story as VxFS on HPUX (concurrent I/O is direct I/O without the inode lock contention). All is based on the fact that Oracle knows better the context of the blocks and is more clever to know what should be kept or not. So even if read ahead, write behind and I/O buffering is nice at filesystem level, better to allocate the memory to Oracle.
By experience you must be caution and see how much memory you can use for Oracle. For cio option (dio for JFS) of JFS2 we have to be caution because we had few drawback when playing with its equivalent on HPUX, but anyway you should try it (not for ORACLE_HOME). For example the scp/rcp commands used when doing transport Tablespace see its performance killed when target filesystem is mount in direct I/O. Small QA servers with few memory available benefit from OS file caching.
When concurrent or direct I/O is used asynchronous I/O are de-activated (OS file caching is also unused when doing raw devices).
To see what option are used:
mount |
LVM always use a pbuf to process every I/O request. A pbuf is a “pinned memory buffer”. LVM extent the pbuf pool when you add a PV to a VG. For AIX release 5.2 and older the pbuf pool is system-wide. Starting with AIX 5.3 this pool is per VG and you now have the new tool lvmo to display statistics per VG. However this new command is only for VG dedicated LVM pbuf tunable parameters i.e. ioo command must be used for system-wide pbuf tunable parameters and so vmstat -v still show the system-wide number of I/Os block due to too small pbuf pool (like in previous AIX release).
To check value of pbuf pool:
server1{root}# ioo -a | grep pv_min_pbuf pv_min_pbuf = 512 |
server1{root}# lvmo -a -v vg220d vgname = vg220d pv_pbuf_count = 512 total_vg_pbufs = 3072 max_vg_pbuf_count = 65536 pervg_blocked_io_count = 0 pv_min_pbuf = 512 global_blocked_io_count = 1352 server1{root}# lvmo -a -v vg220c vgname = vg220c pv_pbuf_count = 512 total_vg_pbufs = 2048 max_vg_pbuf_count = 65536 pervg_blocked_io_count = 0 pv_min_pbuf = 512 global_blocked_io_count = 1352 |
To change it use (take care to monitor performance before changing it):
server1{root}# ioo –p –o pv_min_pbuf=number of pbufs per PV |
Monitoring:
server1{root}# vmstat -v | grep "blocked with no pbuf" 1352 pending disk I/Os blocked with no pbuf |
Remark:
Same value as with lvmo command i.e. 1352.
LTG size is the maximum transfer size for disk I/O, this is a per VG parameter and is lower bounded by the smallest disk transfer size of the VG. Default value should satisfy most workload, monitor performance when planning to change it. Current value can be displayed with:
server1{root}# lsvg vg220d VOLUME GROUP: vg220d VG IDENTIFIER: 00c9cf0000004c000000011a9aee4f69 VG STATE: active PP SIZE: 64 megabyte(s) VG PERMISSION: read/write TOTAL PPs: 2702 (172928 megabytes) MAX LVs: 512 FREE PPs: 0 (0 megabytes) LVs: 31 USED PPs: 2702 (172928 megabytes) OPEN LVs: 31 QUORUM: 4 (Enabled) TOTAL PVs: 6 VG DESCRIPTORS: 6 STALE PVs: 0 STALE PPs: 0 ACTIVE PVs: 6 AUTO ON: yes MAX PPs per VG: 130048 MAX PPs per PV: 1016 MAX PVs: 128 LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no HOT SPARE: no BB POLICY: relocatable |
Strangely at disk level it does not work on my system:
server1{root}# lsvg -p vg220d vg220d: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdisk29 active 539 0 00..00..00..00..00 hdisk30 active 539 0 00..00..00..00..00 hdisk31 active 539 0 00..00..00..00..00 hdisk32 active 539 0 00..00..00..00..00 hdisk33 active 539 0 00..00..00..00..00 hdisk69 active 7 0 00..00..00..00..00 server1{root}# lquerypv -M hdisk29 0 server1{root}# lspv hdisk29 PHYSICAL VOLUME: hdisk29 VOLUME GROUP: vg220d PV IDENTIFIER: 00c9cf009aee4b05 VG IDENTIFIER 00c9cf0000004c000000011a9aee4f69 PV STATE: active STALE PARTITIONS: 0 ALLOCATABLE: yes PP SIZE: 64 megabyte(s) LOGICAL VOLUMES: 8 TOTAL PPs: 539 (34496 megabytes) VG DESCRIPTORS: 1 FREE PPs: 0 (0 megabytes) HOT SPARE: no USED PPs: 539 (34496 megabytes) MAX REQUEST: ??????? FREE DISTRIBUTION: 00..00..00..00..00 USED DISTRIBUTION: 108..108..107..108..108 |
Filesystems tuning
LVM volume group should be created in Scalable VG type:
VG type | Maximum PVs | Maximum LVs | Maximum PPs per VG | Maximum PP size |
---|---|---|---|---|
Normal VG | 32 | 256 | 32,512 (1016 * 32) | 1 GB |
Big VG | 128 | 512 | 130,048 (1016 * 128) | 1 GB |
Scalable VG | 1024 | 4096 | 2,097,152 | 128 GB |
Tablespaces FS should be created using agblksize=4096 and agblksize=512 for redo and maybe control files filesystems. To display filesystems parameters:
server1{root}# lsfs -q /ora320/data Name Nodename Mount Pt VFS Size Options Auto Accounting /dev/vgd21dlvol8 -- /ora320/data jfs2 36306944 rw yes no (lv size: 36306944, fs size: 36306944, block size: 4096, sparse files: yes, inline log: no, inline log size: 0, EAformat: v1, Quota: no, DMAPI: no, VIX: no) |
Inline JFS2 must be avoided, you should have one jfs2log per filesystem and it should ideally resides on a different physical volume than the filesystem. To check if you are using inline log:
server1{root}# lsfs -q /ora320/work_data Name Nodename Mount Pt VFS Size Options Auto Accounting /dev/vgd21dlvol37 -- /ora320/work_data jfs2 12320768 rw yes no (lv size: 12320768, fs size: 12320768, block size: 4096, sparse files: yes, inline log: no, inline log size: 0, EAformat: v1, Quota: no, DMAPI: no, VIX: no) |
server1{root}# lsvg -l vg220d vg220d: LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT vgd21dlvol14 jfs2 1 1 1 open/syncd /ora320/ctrl vgd21dlvol7 jfs2 328 328 1 open/syncd /ora320/arch vgd21dlvol12 jfs2 130 130 1 open/syncd /ora320/data_da vgd21dlvol57 jfs2 92 92 1 open/syncd /ora320/data_ma vgd21dlvol9 jfs2 78 78 1 open/syncd /ora320/data_mj vgd21dlvol11 jfs2 130 130 1 open/syncd /ora320/data_dj vgd21dlvol30 jfs2 125 125 1 open/syncd /ora320/data_rdj vgd21dlvol23 jfs2 93 93 3 open/syncd /ora320/index_da vgd21dlvol8 jfs2 277 277 1 open/syncd /ora320/data vgd21dlvol10 jfs2 94 94 2 open/syncd /ora320/data_wj vgd21dlvol13 jfs2 5 5 1 open/syncd /ora320/dump vgd21dlvol16 jfs2 146 146 1 open/syncd /ora320/index vgd21dlvol21 jfs2 74 74 1 open/syncd /ora320/index_dj vgd21dlvol58 jfs2 61 61 1 open/syncd /ora320/index_ma vgd21dlvol17 jfs2 63 63 2 open/syncd /ora320/index_mj vgd21dlvol19 jfs2 80 80 3 open/syncd /ora320/index_wj vgd21dlvol25 jfs2 6 6 1 open/syncd /ora320/log vgd21dlvol28 jfs2 96 96 2 open/syncd /ora320/rbs vgd21dlvol15 jfs2 121 121 1 open/syncd /ora320/software vgd21dlvol26 jfs2 18 18 1 open/syncd /ora320/sys vgd21dlvol27 jfs2 47 47 1 open/syncd /ora320/temp vgd21dlvol24 jfs2 73 73 1 open/syncd /ora320/temp_da vgd21dlvol31 jfs2 64 64 1 open/syncd /ora320/index_rdj vgd21dlvol22 jfs2 81 81 2 open/syncd /ora320/temp_dj vgd21dlvol37 jfs2 94 94 1 open/syncd /ora320/work_data vgd21dlvol20 jfs2 31 31 1 open/syncd /ora320/temp_wj vgd21dlvol18 jfs2 21 21 1 open/syncd /ora320/temp_mj loglv05 jfs2log 1 1 1 open/syncd N/A vgd21dlvol32 jfs2 96 96 2 open/syncd /ora320/index_r4da vgd21dlvol33 jfs2 160 160 3 open/syncd /ora320/data_r4da vgd21dlvol34 jfs2 16 16 1 open/syncd /ora320/temp_r4da |
Then to associate filesystems and jfs2log check /etc/filesystems file.
There are also few JFS2 parameters that you can change (ioo to display and set them):
server1{root}# ioo -a j2_atimeUpdateSymlink = 0 j2_dynamicBufferPreallocation = 16 j2_inodeCacheSize = 400 j2_maxPageReadAhead = 128 j2_maxRandomWrite = 0 j2_maxUsableMaxTransfer = 512 j2_metadataCacheSize = 400 j2_minPageReadAhead = 2 j2_nBufferPerPagerDevice = 512 j2_nPagesPerWriteBehindCluster = 32 j2_nRandomCluster = 0 j2_nonFatalCrashesSystem = 0 j2_syncModifiedMapped = 1 j2_syncPageCount = 0 j2_syncPageLimit = 256 j2_syncdLogSyncInterval = 1 j2_unmarkComp = 0 jfs_clread_enabled = 0 jfs_use_read_lock = 1 lvm_bufcnt = 9 maxpgahead = 8 maxrandwrt = 0 memory_frames = 12582912 minpgahead = 2 numclust = 1 numfsbufs = 196 pd_npages = 65536 pgahd_scale_thresh = 0 pv_min_pbuf = 512 sync_release_ilock = 0 |
VMM offer the sequential read ahead capability i.e. reading in-advance block of a file you access sequentially. you can set the minimum number of page read when VMM detect a sequential access with j2_minPageReadAhead and the maximum number of pages with j2_maxPageReadAhead. Respective IBM suggested value are Max(2,db_block_size/4096) and Max(256,db_block_size/4096*db_file_multiblock_read_count).
I/O monitoring
server1{root}# vmstat -I 1 2 System configuration: lcpu=6 mem=49152MB kthr memory page faults cpu -------- ----------- ------------------------ ------------ ----------- r b p avm fre fi fo pi po fr sr in sy cs us sy id wa 0 0 0 4705720 1643943 0 8 0 0 0 0 7 2987 404 0 2 98 0 0 0 0 4705720 1643943 0 16 0 0 0 0 9 253 373 0 0 99 0 |
server1{root}# iostat 1 2 System configuration: lcpu=6 drives=75 paths=4 vdisks=0 tty: tin tout avg-cpu: % user % sys % idle % iowait 124.0 644.0 0.7 0.7 98.6 0.0 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk1 0.0 20.0 5.0 0 20 hdisk0 0.0 20.0 5.0 0 20 hdisk28 0.0 0.0 0.0 0 0 hdisk30 0.0 0.0 0.0 0 0 hdisk31 0.0 0.0 0.0 0 0 hdisk29 0.0 0.0 0.0 0 0 . . |
server1{root}# sar -P ALL 1 2 AIX server1 3 5 0035208A4C00 06/09/10 System configuration: lcpu=6 mode=Capped 17:21:28 cpu %usr %sys %wio %idle 17:21:29 0 0 0 0 100 1 0 0 0 100 2 1 0 0 99 3 0 1 0 99 4 0 0 0 100 5 0 0 0 99 - 0 0 0 99 17:21:30 0 0 0 0 100 1 0 0 0 100 2 0 0 0 100 3 0 0 0 100 4 0 0 0 99 5 0 0 0 99 - 0 0 0 100 Average 0 0 0 0 100 1 0 0 0 100 2 0 0 0 100 3 0 1 0 99 4 0 1 0 98 5 0 0 0 100 - 0 0 0 99 |
Another comand to mention is lvmstat:
server1{root}# lvmstat -l vgd21dlvol37 -e server1{root}# lvmstat -l vgd21dlvol37 2 . . server1{root}# lvmstat -l vgd21dlvol37 -d |
References
- Minimum Software Versions and Patches Required to Support Oracle Products on IBM Power Systems Doc ID: 282036.1
- IBM developerWorks: Wikis – Systems – AIX
- AIX: Database performance gets slower the longer the database is running [ID 316533.1]
- How to enable Large Page Feature on AIX-Based Systems [ID 372157.1]
- ORA-27126 during database startup after 10.2.0.4 patchset software install on AIX5L [ID 603489.1]
- Bug 7226548: 10.2.0.4 NOT USING LARGE PAGES
- Guide to Multiple Page Size Support on AIX 5L Version 5.3
- Oracle not using large pages on AIX [ID 848632.1]
- AIX: DBWR trace warning: LIO_LISTIO RETURNED EAGAIN [ID 265491.1]
- AIX: Determining Oracle Memory Usage On AIX [ID 123754.1]
- Overview of AIX page replacement
Fernando says:
Thanks, very interested. As now it is Dec, 2017. I haven’t found information regarding to tune Oracle 11 with AIX 7.1. Many of the parameters link lru_file_repage, seems that aren’t available in this version. Could you please tell us if there’s updated information?
Thanks.
Yannick Jaquier says:
Thanks for nice comment ! Anyways as you have seen the document start to be really old and must be taken only as a base for further investigations with latest parameters both from Oracle and AIX. Unfortunately I have never updated it because we now, as many people around, do Oracle mainly on Linux. So our most important databases are 100% running on Linux but who knows in case of issue on our few AIX boxes I might take a bit of time to investigate and will then share…
David Browne says:
We have been trying to work out why our paging space usage kept creeping up and up and after 3-6 months the server needed a reboot. The reason is that we use ‘SHUTDOWN ABORT’ when shutting down our EBS (E-Business) database which has natively compiled code. This has the effect of leaving stale native code in memory. This stale code is not used for anything, it’s just left there and builds up and up over time, eating into the paging space.
On AIX, monitor memory usage:
ipcs -ar |grep -e JOXSHM_EXT -e PESHM -e PESLD | awk ‘{cnt+=1; sum+=$10} END {print “Count=”,cnt,”Sum=”,sum,”Average=”,sum/cnt}’
remove them by:
ipcs -ar |grep -e JOXSHM_EXT -e PESHM -e PESLD | awk ‘ {name = $17; cmd = “ipcrm -r -m ” name; print “Run “, cmd; system( cmd )}’
or
(if you are on 11.2.0.3 – apply patch 13574534)
11.2.0.4 and greater
set _ncomp_shared_objects_dir in your init.ora to a temporary directory. The native code will then be stored in this directory which can be deleted only after the database is shutdown. If you use this parameter, the natively compiled code will no longer be stored in memory and there will be no further build up of stale native code in memory.
http://ksun-oracle.blogspot.co.uk/2013/11/remove-stale-native-code-files-on-aix.html
Yannick Jaquier says:
David, thanks for additional information. Anyway I strongly think using shutdown abort is a bad idea, better you use shutdown immediate…
sql_sasquatch says:
Hello,
The post mentions the hpmcount utility as a way to monitor TLB misses on IBM Power platform. The hpmcount utility can be used as a wrapper for the command to be monitored. I’ve found hpmstat to be more useful for evaluating Oracle, since it will provide system wide metrics. Requires root privileges.
http://sql-sasquatch.blogspot.com/2013/04/aix-hpmstat-utility-on-power-7.html
http://pic.dhe.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.cmds/doc/aixcmds2/hpmstat.htm
Lodh says:
Hii Yannick
Thanks for your explanation. yes indeed i know that the IO is chuck size is restricted to LTG boundary and its default to 256k on hp-ux when using LVM as defined in the header files.
Well and if fail to agree with me for the max IO size in hp-ux, then can you please let me know the reason as why oracle parameter DB_FILE_MULTIBLOCK_READ_COUNT on hp-ux defaults to 128 with a 8K database and its sets to 64 for a 16k database ?
Yannick Jaquier says:
Hi Lodh,
See Oracle support note 841444.1.
Yannick.
Lodh says:
Hello Yannick
Thanks for the Oracle Note. Here Oracle says its 1M for maximum platforms. What i found out is for hp-ux the max IO transfer unit is 1M for asynchrounous io but when using LVM its 256K. So i suppose Oracle by default sets the parameter db_file_multiblock_read_count as per the DB blocksize and assuming asynchronous io mode.
Lodh says:
Discussing Oracle & IO — In my environment we have has the setting of DB_FILE_MULTIBLOCK_READ_COUNT(128) * Oracle block size (8K)= 1Mb ( which is the max IO chunk, that can be handled by vxfs software). The Filesystem block size is also 8k so as not to have more IO request per oracle block.
Generally, when LVM receives a request for an I/O, it breaks the I/O down into what is called logical track group (LTG) sizes before it passes the request down to the device driver of the underlying disks. The LTG is the maximum transfer size of an LV and is common to all the LVs in the VG. Large disk I/O requests made by applications were broken up by the LVM on LTG alignments.
What makes me wonder is – We are using buffered IO on vxfs filesystem on HP-UX and oracle 10g. So whatever be the size of oracle blocks the OS kernel or disk drivers are supposed to build batches of optimum i/o requests before placing it to the storage, isnt it ?
We also use the vxfs mount option mincache=direct & convosync=direct but still i think we simulate direct i/o which means we only bypass the OS buffers ( ie data transferred directly from disk to user buffer) but that doesnot account for the IO request size, what do you think ?
Yannick Jaquier says:
Hi Lodh,
LVM I/O requets size is equal to LTG and seems to be 256KB on HPUX while it’s a tunable (dynamic) parameter on AIX.
According to HPUX documentation on VxFS 4.1, default maximum I/O size (tunable parameter) is 256KB which seems to be inline with Maximum disk driver I/O size:
# uname -a
HP-UX localhost B.11.31 U ia64 2829901614 unlimited-user license
# grep -i MAXPHYS /usr/include/machine/sys/param.h
#define MAXPHYS (256 * 1024) /* Maximum size of physical I/O transfer */
Even if easy to find on Google:
Then for your question if disk driver is building optimum I/O requests (not sure to really understand what you mean.. NCQ ?) then I don’t think so as the relation between physical disk/disk array cache/OS cache/LVM disk/VxFS looks quite complex to handle… Moreover with an Oracle database workload you cannot let an I/O pending waiting for an optimal window to achieve it isn’t it ?
Then for your direct I/O story, see above VxFS maximum drect I/O size is 256KB adn is tunable with VxFS > 4.1.
Yannick.
Ralf Schmidt-Dannert says:
I strongly recommend to review the latest published IBM white paper titled:
Oracle Architecture and Tuning on AIX v2.0
It is available here and reflects the current best tuning practices as applied by IBM Advanced Technical Skills team and includes AIX 5.3 and AIX6.1 recommendations / changes which are not reflected in the above article:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP100883
Metalink Note 316533.1 is very questionable as it does not take into account or even mention the vmo tuning variable “lru_file_repage” which is available I believe since AIX 5.3 TL03. See the above white paper for current best practices!
Philip Verdieck says:
Thank you very much for that IBM whitepaper link. We are migrating from DB2 to Oracle on AIX, we operate an ERP system (CGI AMS), and I am just starting to try and tune this thing.