Preamble

After first blog post on Non-Uniform Memory Access (NUMA) I have been shared by teammates few interesting articles (see references) and so wanted to go a bit deeper on this subject before definitively closing it (you will see in conclusion below why).

I have been deeper in NUMA details on both Itanium 11iv2 (11.23) and Linux RedHat 5.5 (Tikanga). The Oracle database running on Itanium is a 10.2.0.4 while my Linux box run multiple releases (10.2.0.4, 10.2.0.3, 11.2.0.1, 11.2.0.2, …).

NUMA on HPUX Itanium

The server on which I’m working is the same a previous NUMA post:/

server1{root}# model
ia64 hp server rx8640
server1{root}# mpsched -s
System Configuration
=====================
 
Locality Domain Count: 2
Processor Count      : 4
 
Domain    Processors
------    ----------
   0        0
   1        7   8   9

This server is in fact an nPAR of a bigger physical server:

server1{root}# parstatus -w
The local partition number is 1.

server1{root}# parstatus -V -p 1
[Partition]
Partition Number       : 1
Partition Name         : server2 server1
Status                 : Active
IP Address             :
Primary Boot Path      : 3/0/2/1/0/4/1.0.0
Alternate Boot Path    : 3/0/4/1/0/4/1.0.0
HA Alternate Boot Path :
PDC Revision           : 9.48
IODCH Version          : ffff
Cell Architecture      : Itanium(R)-based
CPU Compatibility      : CDH-640
CPU Speed              : 1598 MHz
Core Cell              : cab0,cell2
Core Cell Choice [0]   : cab0,cell2
Total Good Memory Size : 64.0 GB
Total Interleave Memory: 64.0 GB
Total Requested CLM    : 0.0 GB
Total Allocated CLM    : 0.0 GB
Hyperthreading Enabled : no
 
 
[Cell]
                        CPU     Memory                                Use
                        OK/     (GB)                          Core    On
Hardware   Actual       Deconf/ OK/                           Cell    Next Par
Location   Usage        Max     Deconf    Connected To        Capable Boot Num
========== ============ ======= ========= =================== ======= ==== ===
cab0,cell2 Active Core  6/0/8   32.0/0.0  cab8,bay0,chassis0  yes     yes  1
cab0,cell3 Active Base  4/0/8   32.0/0.0  cab8,bay0,chassis1  yes     yes  1
 
Notes: * = Cell has no interleaved memory.
 
 
[Chassis]
                                 Core Connected  Par
Hardware Location   Usage        IO   To         Num
=================== ============ ==== ========== ===
cab8,bay0,chassis0  Active       yes  cab0,cell2 1
 
 
[Chassis]
                                 Core Connected  Par
Hardware Location   Usage        IO   To         Num
=================== ============ ==== ========== ===
cab8,bay0,chassis1  Active       yes  cab0,cell3 1

On top of the fact that my cells are not CPU equally balanced no Cell Local Memory (CLM) has been configured (default on older HP servers is: 100% ILM, 0% CLM so behave of default configuration). The generic HP advice is to configure 7/8 of available memory as CLM.

Then using the following shell commands you are able to see to which processor are attached the Oracle background and slave processes:

server1{root}# top -f /tmp/top_output -n 2000
server1{root}# grep ora\_ /tmp/top_output

Then HP states that parallel slaves and database writer should be evenly distributed on cells. They are all attached to cell 0 (and so to unique CPU 0) in my case so another *strange* behavior… Behavior I have not seen on antoher Itanium box where each cell was running an database writer and archive process.

Then reading further HP clarify it once for all:

For Oracle Database version 10.2.0.4, the optimizations are not implemented correctly and should never be used: _enable_NUMA_optimization should always be set to false (and the server should thus be configured ILM-heavy) for Oracle 10.2.0.4

Notice that you can choose on which cell (or processor, option mutually exclusive with locality-domain one) you want to run your process. Even if in my small example it has no interest HP suggest to investigate the LL (Least Loaded) policy for Oracle listener:

server1{root}# mpsched -c 9 -O LL sleep 60 &
[1]     28198
server1{root}# Pid 28198: bound to processor 9 using the least loaded process launch policy

Then check it is using the CPU you expected:

server1{root}# top -f /tmp/top_output -n 2000
server1{root}# grep 28198 /tmp/top_output
 9 pts/6 28198 root     168 24  3724K   264K sleep    0:00  0.00  0.00 sleep

Then what about Oracle ? Here below what I get with my 10.2.0.4 instance running on Itanium (_enable_NUMA_optimization set to true by default):

SQL> SET lines 150 pages 100
SQL> col description FOR a70
SQL> col  PARAMETER FOR a30
SQL> col VALUE FOR a15
SQL> SELECT
  a.ksppinm AS parameter,
  c.ksppstvl AS VALUE,
  a.ksppdesc AS description,
  b.ksppstdf AS "Default?"
FROM x$ksppi a, x$ksppcv b, x$ksppsv c
WHERE a.indx = b.indx
AND a.indx = c.indx
AND LOWER(a.ksppinm) LIKE '%numa%'
ORDER BY a.ksppinm;
 
PARAMETER                      VALUE           DESCRIPTION                                                            DEFAULT?
------------------------------ --------------- ---------------------------------------------------------------------- ---------
_NUMA_instance_mapping         NOT specified   SET OF nodes that this instance should run ON                          TRUE
_NUMA_pool_size                NOT specified   aggregate SIZE IN bytes OF NUMA pool                                   TRUE
_db_block_numa                 2               NUMBER OF NUMA nodes                                                   TRUE
_enable_NUMA_optimization      TRUE            Enable NUMA specific optimizations                                     TRUE
_rm_numa_sched_enable          FALSE           IS RESOURCE Manager (RM) related NUMA scheduled policy enabled         TRUE
_rm_numa_simulation_cpus       0               NUMBER OF cpus per PG FOR numa simulation IN RESOURCE manager          TRUE
_rm_numa_simulation_pgs        0               NUMBER OF PGs FOR numa simulation IN RESOURCE manager                  TRUE
 
7 ROWS selected.

The display in alert.log file is also difficult to catch:

.
Opening with internal Resource Manager plan
where NUMA PG = 2, CPUs = 1
.

NUMA on Linux

As a reminder use numactl command to display hardware information (the server behind is an HP BL620c G7 with 2 CPUs (Intel(R) Xeon(R) CPU X7560 @ 2.27GHz) of 8 cores each, so 32 threads if you activate hyper-threading):

[root@server1 ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 size: 32289 MB
node 0 free: 2595 MB
node 1 size: 32320 MB
node 1 free: 267 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

And how NUMA settings:

[root@server1 ~]# numactl --show
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
cpubind: 0 1
nodebind: 0 1
membind: 0 1

In my previous post I was wondering how NUMA got implemented with x86 technology. In short it is coming from Intel/AMD technologies. Intel QuickPath Interconnect (QuickPath, QPI), previously call Common System Interface (CSI) for Intel. HyperTransport (HT), earlier Lightning Data Transport (LDT) for AMD. Refer to Wikipedia for more information.

On Linux the top command does not print the CPU id where is running the process, fortunately the ps command is enhanced and you can have it using a custom display (psr column). The example database below is a 11.2.0.2 release (where NUMA is off by default):

[root@server1 ~]$ ps -eo psr,sgi_p,pid,user,rtprio,ni,time,%cpu,cmd | egrep 'PSR|oratlste'
PSR P   PID USER     RTPRIO  NI     TIME %CPU CMD
  1 *  6223 oratlste      -   0 00:00:00  0.0 ora_w000_tlste
  1 * 12773 oratlste      -   0 00:00:02  0.0 ora_pmon_tlste
 11 * 12775 oratlste      -   0 00:00:00  0.0 ora_psp0_tlste
 18 * 12782 oratlste      -   0 00:00:00  0.0 ora_vktm_tlste
 16 * 12786 oratlste      -   0 00:00:00  0.0 ora_gen0_tlste
  0 * 12788 oratlste      -   0 00:00:00  0.0 ora_diag_tlste
 10 * 12790 oratlste      -   0 00:00:00  0.0 ora_dbrm_tlste
 23 * 12792 oratlste      -   0 00:01:19  0.0 ora_dia0_tlste
 15 * 12794 oratlste      -   0 00:00:00  0.0 ora_mman_tlste
 18 * 12796 oratlste      -   0 00:00:01  0.0 ora_dbw0_tlste
 18 * 12798 oratlste      -   0 00:00:01  0.0 ora_dbw1_tlste
 17 * 12800 oratlste      -   0 00:00:00  0.0 ora_dbw2_tlste
 18 * 12802 oratlste      -   0 00:00:00  0.0 ora_dbw3_tlste
  2 * 12804 oratlste      -   0 00:00:02  0.0 ora_lgwr_tlste
  3 * 12807 oratlste      -   0 00:00:16  0.0 ora_ckpt_tlste
 15 * 12809 oratlste      -   0 00:00:09  0.0 ora_smon_tlste
 26 * 12811 oratlste      -   0 00:00:00  0.0 ora_reco_tlste
  3 * 12813 oratlste      -   0 00:00:08  0.0 ora_mmon_tlste
  7 * 12815 oratlste      -   0 00:00:52  0.0 ora_mmnl_tlste
 24 * 12841 oratlste      -   0 00:00:00  0.0 ora_qmnc_tlste
  2 * 12917 oratlste      -   0 00:00:00  0.0 /ora_tlste/software/bin/tnslsnr LISTENER_tlste -inherit
  6 * 13018 oratlste      -   0 00:00:01  0.0 ora_q000_tlste
 10 * 13020 oratlste      -   0 00:00:00  0.0 ora_q001_tlste
  9 * 29779 oratlste      -   0 00:00:00  0.0 ora_smco_tlste
.
.

The cell affinity can be displayed using /proc/pid/numa_maps file (default affinity by default 🙂 ):

[root@server1 ~]# echo $$
20518
[root@server1 ~]# cat /proc/20518/numa_maps
00400000 default file=/bin/bash mapped=118 mapmax=38 N0=96 N1=22
006b2000 default file=/bin/bash anon=10 dirty=10 active=9 N0=10
006bc000 default anon=5 dirty=5 active=3 N0=5
008bb000 default file=/bin/bash mapped=2 mapmax=33 N0=2
06116000 default heap anon=65 dirty=65 active=58 N0=65
320dc00000 default file=/lib64/ld-2.5.so mapped=25 mapmax=868 N0=25
320de1c000 default file=/lib64/ld-2.5.so anon=1 dirty=1 N0=1
320de1d000 default file=/lib64/ld-2.5.so anon=1 dirty=1 N0=1
320e000000 default file=/lib64/libc-2.5.so mapped=123 mapmax=877 N0=123
320e14e000 default file=/lib64/libc-2.5.so
320e34e000 default file=/lib64/libc-2.5.so anon=1 dirty=1 mapped=4 mapmax=839 N0=4
320e352000 default file=/lib64/libc-2.5.so anon=1 dirty=1 N0=1
320e353000 default anon=5 dirty=5 N0=5
320e400000 default file=/lib64/libdl-2.5.so mapped=2 mapmax=841 N0=2
320e402000 default file=/lib64/libdl-2.5.so
320e602000 default file=/lib64/libdl-2.5.so anon=1 dirty=1 N0=1
320e603000 default file=/lib64/libdl-2.5.so anon=1 dirty=1 N0=1
320e800000 default file=/lib64/libtermcap.so.2.0.8 mapped=3 mapmax=35 N0=3
320e803000 default file=/lib64/libtermcap.so.2.0.8
320ea02000 default file=/lib64/libtermcap.so.2.0.8 anon=1 dirty=1 N0=1
2af022372000 default anon=2 dirty=2 N0=2
2af022384000 default anon=2 dirty=2 N0=2
2af022386000 default file=/lib64/libnss_files-2.5.so mapped=5 mapmax=810 N0=5
2af022390000 default file=/lib64/libnss_files-2.5.so
2af02258f000 default file=/lib64/libnss_files-2.5.so anon=1 dirty=1 N0=1
2af022590000 default file=/lib64/libnss_files-2.5.so anon=1 dirty=1 N0=1
2af022591000 default file=/usr/lib/locale/locale-archive mapped=15 mapmax=321 N0=15
2af025b5d000 default file=/usr/lib64/gconv/gconv-modules.cache mapped=6 mapmax=112 N0=6
2af025b64000 default anon=2 dirty=2 N0=2
7fffb4eb2000 default stack anon=7 dirty=7 N0=7
7fffb4f67000 default mapped=1 mapmax=821 active=0 N0=1

Remark:
As clearly stated by RedHat it is not recommended to display this file on production system because of possible side effects…

An example using memory on node 1 (whenever possible):

[root@server1 ~]# numactl --preferred=1 sleep 30 &
[1] 17819
[root@server1 ~]# cat /proc/17819/numa_maps
00400000 prefer=1 file=/bin/sleep mapped=4 N0=4
00604000 prefer=1 file=/bin/sleep anon=1 dirty=1 N1=1
0ddad000 prefer=1 heap anon=2 dirty=2 N1=2
320dc00000 prefer=1 file=/lib64/ld-2.5.so mapped=23 mapmax=872 N0=23
320de1c000 prefer=1 file=/lib64/ld-2.5.so anon=1 dirty=1 N1=1
320de1d000 prefer=1 file=/lib64/ld-2.5.so anon=1 dirty=1 N1=1
320e000000 prefer=1 file=/lib64/libc-2.5.so mapped=73 mapmax=880 N0=73
320e14e000 prefer=1 file=/lib64/libc-2.5.so
320e34e000 prefer=1 file=/lib64/libc-2.5.so anon=1 dirty=1 mapped=4 mapmax=846 N0=3 N1=1
320e352000 prefer=1 file=/lib64/libc-2.5.so anon=1 dirty=1 N1=1
320e353000 prefer=1 anon=4 dirty=4 N1=4
2b4cc8f50000 prefer=1 anon=2 dirty=2 N1=2
2b4cc8f62000 prefer=1 anon=1 dirty=1 N1=1
2b4cc8f63000 prefer=1 file=/usr/lib/locale/locale-archive mapped=12 mapmax=318 N0=12
7fff644be000 prefer=1 stack anon=3 dirty=3 N1=3
7fff6451f000 prefer=1 mapped=1 mapmax=826 active=0 N0=1

N{x} is the node where memory has been allocated. And number is size in number of pages (4KB).

Using migratepages command you can move pages from one node to another. First let’s cerate a process using node 1

[root@server1 ~]# numactl --preferred=1 sleep 300 &
[1] 26021

We can see that 2 pages have been allocated on node 1 (refer=1 stack anon=2 dirty=2 N1=2):

[root@server1 ~]# cat /proc/26021/numa_maps
00400000 prefer=1 file=/bin/sleep mapped=4 N1=4
00604000 prefer=1 file=/bin/sleep anon=1 dirty=1 N1=1
146f0000 prefer=1 heap anon=2 dirty=2 N1=2
35bec00000 prefer=1 file=/lib64/ld-2.5.so mapped=23 mapmax=51 N0=23
35bee1c000 prefer=1 file=/lib64/ld-2.5.so anon=1 dirty=1 N1=1
35bee1d000 prefer=1 file=/lib64/ld-2.5.so anon=1 dirty=1 N1=1
35bf000000 prefer=1 file=/lib64/libc-2.5.so mapped=73 mapmax=58 N0=73
35bf14e000 prefer=1 file=/lib64/libc-2.5.so
35bf34e000 prefer=1 file=/lib64/libc-2.5.so anon=1 dirty=1 mapped=4 mapmax=39 N0=3 N1=1
35bf352000 prefer=1 file=/lib64/libc-2.5.so anon=1 dirty=1 N1=1
35bf353000 prefer=1 anon=4 dirty=4 N1=4
2abc9ff58000 prefer=1 anon=2 dirty=2 N1=2
2abc9ff6c000 prefer=1 anon=1 dirty=1 N1=1
2abc9ff6d000 prefer=1 file=/usr/lib/locale/locale-archive mapped=11 mapmax=38 N0=11
7fff9689f000 prefer=1 stack anon=2 dirty=2 N1=2
7fff9698d000 prefer=1 mapped=1 mapmax=41 active=0 N1=1

Let’s migrate the process to node 0:

[root@server1 ~]# migratepages 26021 1 0

The memory pages are now located on node 0 (prefer=1 stack anon=2 dirty=2 N0=2):

[root@server1 ~]# cat /proc/26021/numa_maps
00400000 prefer=1 file=/bin/sleep mapped=4 N0=4
00604000 prefer=1 file=/bin/sleep anon=1 dirty=1 N0=1
146f0000 prefer=1 heap anon=2 dirty=2 N0=2
35bec00000 prefer=1 file=/lib64/ld-2.5.so mapped=23 mapmax=51 N0=23
35bee1c000 prefer=1 file=/lib64/ld-2.5.so anon=1 dirty=1 N0=1
35bee1d000 prefer=1 file=/lib64/ld-2.5.so anon=1 dirty=1 N0=1
35bf000000 prefer=1 file=/lib64/libc-2.5.so mapped=73 mapmax=58 N0=73
35bf14e000 prefer=1 file=/lib64/libc-2.5.so
35bf34e000 prefer=1 file=/lib64/libc-2.5.so anon=1 dirty=1 mapped=4 mapmax=39 N0=4
35bf352000 prefer=1 file=/lib64/libc-2.5.so anon=1 dirty=1 N0=1
35bf353000 prefer=1 anon=4 dirty=4 N0=4
2abc9ff58000 prefer=1 anon=2 dirty=2 N0=2
2abc9ff6c000 prefer=1 anon=1 dirty=1 N0=1
2abc9ff6d000 prefer=1 file=/usr/lib/locale/locale-archive mapped=11 mapmax=38 N0=11
7fff9689f000 prefer=1 stack anon=2 dirty=2 N0=2
7fff9698d000 prefer=1 mapped=1 mapmax=41 active=0 N1=1

For my Oracle 11.2.0.2 test database, if I set _enable_NUMA_support to true (in 11gR2 _enable_NUMA_optimization is deprecated and you must use _enable_NUMA_support):

SQL> SET lines 150 pages 100
SQL> col description FOR a70
SQL> col  PARAMETER FOR a30
SQL> col VALUE FOR a15
SQL> SELECT
  a.ksppinm AS parameter,
  c.ksppstvl AS VALUE,
  a.ksppdesc AS description,
  b.ksppstdf AS "Default?"
FROM x$ksppi a, x$ksppcv b, x$ksppsv c
WHERE a.indx = b.indx
AND a.indx = c.indx
AND LOWER(a.ksppinm) LIKE '%numa%'
ORDER BY a.ksppinm;
 
PARAMETER                      VALUE           DESCRIPTION                                                            DEFAULT?
------------------------------ --------------- ---------------------------------------------------------------------- ---------
_NUMA_instance_mapping         NOT specified   SET OF nodes that this instance should run ON                          TRUE
_NUMA_pool_size                NOT specified   aggregate SIZE IN bytes OF NUMA pool                                   TRUE
_db_block_numa                 2               NUMBER OF NUMA nodes                                                   TRUE
_enable_NUMA_interleave        TRUE            Enable NUMA interleave MODE                                            TRUE
_enable_NUMA_optimization      FALSE           Enable NUMA specific optimizations                                     TRUE
_enable_NUMA_support           TRUE            Enable NUMA support AND optimizations                                  FALSE
_numa_buffer_cache_stats       0               Configure NUMA buffer cache stats                                      TRUE
_numa_trace_level              0               numa trace event                                                       TRUE
_rm_numa_sched_enable          TRUE            IS RESOURCE Manager (RM) related NUMA scheduled policy enabled         TRUE
_rm_numa_simulation_cpus       0               NUMBER OF cpus FOR each pg FOR numa simulation IN RESOURCE manager     TRUE
_rm_numa_simulation_pgs        0               NUMBER OF PGs FOR numa simulation IN RESOURCE manager                  TRUE
 
11 ROWS selected.

The display in alert.log file is more verbose in 11gR2 and it i clearly showing that NUMA is in action (please note I also use HugePages):

.
Numa pool size adjusted
****************** Huge Pages Information *****************
Huge Pages memory pool detected (total: 10395 free: 7995)
DFLT Huge Pages allocation successful (allocated: 146)
NUMA Huge Pages allocation on node (1) (allocated: 186)
NUMA Huge Pages allocation on node (0) (allocated: 182)
DFLT Huge Pages allocation successful (allocated: 1)
***********************************************************
SGA Local memory support enabled
.
.
NUMA system found and support enabled (2 domains - 16,16)
.

As expected the SGA is now split in multiple part:

[root@server1 ~]# ipcs -ma | grep oratlste
0x00000000 1614250006 oratlste  660        306184192  22
0x00000000 1614282778 oratlste  660        390070272  22
0x00000000 1614315548 oratlste  660        381681664  22
0x2c40136c 1614348317 oratlste  660        2097152    22
0x780fe068 630849560  oratlste  660        204

I have tried with a 10.2.0.3 database but even setting explicitly in init.ora below parameters:

_db_block_numa                 2               Number of NUMA nodes                                                   FALSE
_enable_NUMA_optimization      TRUE            Enable NUMA specific optimizations                                     FALSE

The alert.log file does not display anything so really difficult, not to say impossible, to know if NUMA is active or not… I finally got my answer checking the memory segments (so NUMA not in use and I can’t say why):

[root@server1 ~]# ipcs -ma |grep oratrema
0x323c247c 1643937813 oratrema  600        1075838976 18
0xdadb2270 700547097  oratrema  640        154

Additionally the numastat command display few interesting NUMA related counters:

[root@server1 ~]# numastat
                           node0           node1
numa_hit              7498337258      7809234537
numa_miss              123022896       106050915
numa_foreign           106050915       123022896
interleave_hit          16454618        16108121
local_node            7487872474      7801578924
other_node             133487680       113706528

Another funny command is numademo that apparently aim at providing performance of your NUMA system:

[root@server1 ~]# numademo 10m random
2 nodes available
memory with no policy random              Avg 473.38 MB/s Min 698.12 MB/s Max 256.07 MB/s
local memory random                       Avg 682.44 MB/s Min 694.56 MB/s Max 601.49 MB/s
memory interleaved on all nodes random    Avg 677.94 MB/s Min 695.71 MB/s Max 560.02 MB/s
memory on node 0 random                   Avg 680.93 MB/s Min 694.15 MB/s Max 594.03 MB/s
memory on node 1 random                   Avg 667.67 MB/s Min 689.99 MB/s Max 544.57 MB/s
memory interleaved on 0 1 random          Avg 675.53 MB/s Min 692.36 MB/s Max 565.09 MB/s
setting preferred node to 0
memory without policy random              Avg 672.54 MB/s Min 692.04 MB/s Max 553.75 MB/s
setting preferred node to 1
memory without policy random              Avg 678.79 MB/s Min 698.35 MB/s Max 550.35 MB/s
manual interleaving to all nodes random   Avg 680.77 MB/s Min 697.01 MB/s Max 570.59 MB/s
manual interleaving on node 0/1 random    Avg 681.65 MB/s Min 697.24 MB/s Max 578.78 MB/s
current interleave node 0
running on node 0, preferred node 0
local memory random                       Avg 674.03 MB/s Min 688.13 MB/s Max 582.70 MB/s
memory interleaved on all nodes random    Avg 682.52 MB/s Min 696.40 MB/s Max 587.21 MB/s
memory interleaved on node 0/1 random     Avg 681.68 MB/s Min 694.01 MB/s Max 595.24 MB/s
alloc on node 1 random                    Avg 677.95 MB/s Min 694.47 MB/s Max 565.39 MB/s
local allocation random                   Avg 676.42 MB/s Min 692.72 MB/s Max 565.00 MB/s
setting wrong preferred node random       Avg 679.50 MB/s Min 696.17 MB/s Max 578.14 MB/s
setting correct preferred node random     Avg 674.12 MB/s Min 692.68 MB/s Max 550.14 MB/s
running on node 1, preferred node 0
local memory random                       Avg 613.30 MB/s Min 697.84 MB/s Max 301.31 MB/s
memory interleaved on all nodes random    Avg 680.04 MB/s Min 695.39 MB/s Max 574.03 MB/s
memory interleaved on node 0/1 random     Avg 683.59 MB/s Min 696.45 MB/s Max 598.16 MB/s
alloc on node 0 random                    Avg 674.49 MB/s Min 695.76 MB/s Max 548.22 MB/s
local allocation random                   Avg 674.20 MB/s Min 696.64 MB/s Max 552.78 MB/s
setting wrong preferred node random       Avg 672.26 MB/s Min 694.84 MB/s Max 525.68 MB/s
setting correct preferred node random     Avg 679.85 MB/s Min 696.77 MB/s Max 567.47 MB/s

[root@server1 ~]# numademo 10m random 2 nodes available memory with no policy random local memory random memory interleaved on all nodes random memory on node 0 random memory on node 1 random memory interleaved on 0 1 random setting preferred node to 0 memory without policy random setting preferred node to 1 memory without policy random manual interleaving to all nodes random manual interleaving on node 0/1 random current interleave node 0 running on node 0, preferred node 0 local memory random memory interleaved on all nodes random memory interleaved on node 0/1 random alloc on node 1 random local allocation random setting wrong preferred node random setting correct preferred node random running on node 1, preferred node 0 local memory random memory interleaved on all nodes random memory interleaved on node 0/1 random alloc on node 0 random local allocation random setting wrong preferred node random setting correct preferred node random Avg 473.38 MB/s Min 698.12 MB/s Max 256.07 MB/s Avg 682.44 MB/s Min 694.56 MB/s Max 601.49 MB/s Avg 677.94 MB/s Min 695.71 MB/s Max 560.02 MB/s Avg 680.93 MB/s Min 694.15 MB/s Max 594.03 MB/s Avg 667.67 MB/s Min 689.99 MB/s Max 544.57 MB/s Avg 675.53 MB/s Min 692.36 MB/s Max 565.09 MB/s Avg 672.54 MB/s Min 692.04 MB/s Max 553.75 MB/s Avg 678.79 MB/s Min 698.35 MB/s Max 550.35 MB/s Avg 680.77 MB/s Min 697.01 MB/s Max 570.59 MB/s Avg 681.65 MB/s Min 697.24 MB/s Max 578.78 MB/s Avg 674.03 MB/s Min 688.13 MB/s Max 582.70 MB/s Avg 682.52 MB/s Min 696.40 MB/s Max 587.21 MB/s Avg 681.68 MB/s Min 694.01 MB/s Max 595.24 MB/s Avg 677.95 MB/s Min 694.47 MB/s Max 565.39 MB/s Avg 676.42 MB/s Min 692.72 MB/s Max 565.00 MB/s Avg 679.50 MB/s Min 696.17 MB/s Max 578.14 MB/s Avg 674.12 MB/s Min 692.68 MB/s Max 550.14 MB/s Avg 613.30 MB/s Min 697.84 MB/s Max 301.31 MB/s Avg 680.04 MB/s Min 695.39 MB/s Max 574.03 MB/s Avg 683.59 MB/s Min 696.45 MB/s Max 598.16 MB/s Avg 674.49 MB/s Min 695.76 MB/s Max 548.22 MB/s Avg 674.20 MB/s Min 696.64 MB/s Max 552.78 MB/s Avg 672.26 MB/s Min 694.84 MB/s Max 525.68 MB/s Avg 679.85 MB/s Min 696.77 MB/s Max 567.47 MB/s

Conclusion

Then all these for what ? Googling around the gain you may see having a fully compliant chain of software and hardware NUMA aware components may bring you up to 10% improvement or none or even a drawback. So clearly as Oracle corporation state do perform exhaustive test before activating it. Then I tend to say that generic advice is to deactivate it at hardware level.

If we take 11gR2 example where it is not activated by default is there a drawback letting it activated at hardware level ?

On Itanium relying to default value is worst situation and I don’t know if you can deactivate it at hardware level.

Finally even Oracle database does not look too much mature on this feature in 10gR2 (no obsolete by the way) so better play with it in 11gR2.

I can’t resist quoting excellent blog post from Jeremy Cole on MySQL (not too far from Oracle behavior), where he is suggesting to interleave memory for commands starting MySQL…

References

The Oracle Database on HP Integrity servers (HP document)
Dynamic server resource allocation with Oracle Database 10g or 11g on an HP-UX ccNUMA-based server (HP document)
Running Oracle Database 10g or 11g on an HP-UX ccNUMA-based server (HP document)
Locality-Optimized Resource Alignment (HP document)
You Buy a NUMA System, Oracle Says Disable NUMA! What Gives? Part II.
You Buy a NUMA System, Oracle Says Disable NUMA! What Gives? Part I.
The MySQL “swap insanity” problem and the effects of the NUMA architecture

IT World

Non-Uniform Memory Access (NUMA) architecture with Oracle – part 2

Table of contents

Preamble

NUMA on HPUX Itanium

NUMA on Linux

Conclusion

References

About Post Author

Yannick Jaquier

Related Posts:

Leave a Reply Cancel Reply