System statistics interaction with database resource manager

Preamble

Our BI database has been migrated to 11gR2 end of 2014 (better late than never), so few months before the end of premium support. Since we migrated to 10gR2 they have always had below trigger to have few account still using the 9i optimizer (!!):

CREATE OR REPLACE TRIGGER sys.new_logon_trig
AFTER LOGON ON DATABASE
DECLARE
sqlstr VARCHAR2(200) := 'ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE=''9.2.0''';
BEGIN
FOR cur IN (SELECT grantee FROM sys.dba_rsrc_consumer_group_privs WHERE granted_group='BO_UNIVERSE_GROUP')
LOOP
IF USER = cur.username THEN
EXECUTE IMMEDIATE sqlstr;
END IF;
END LOOP;
END new_logon_trig;

And yes we have Database Resource Manager (DBRM) in action to limit reporting users (end users accessing directly the database and BO universe accounts) with below setup (extracted from SQL Developer as Cloud Control 12cR4 has a bug and does not display maximum execution time parameter):

system_stats01
system_stats01
system_stats02
system_stats02

And yes 43200 seconds is 12 hours which has no interest. I initially set it to 45 minutes but it badly deviated over time…

When we moved to 11gR2 I have reasonably argued that keeping the 9i optimizer was most probably not the best idea and BI team has started to work to remove the trigger (in fact they moved accounts one by one out of 9i optimizer using a different table than the database resource manager one). One of the (bad) reason they kept 9i optimizer was the GROUP BY that was not sorting any more result and the need to modify queries to add ORDER BY.

And guess what a query that was running in around 4 minutes with 9i optimizer is now failing for:

SQL> @query1
  UBCK_USERS.AFINISHED_GOOD,
             *
ERROR AT line 26:
ORA-07455: estimated execution TIME (158931 secs), exceeds limit (43200 secs)

158931 seconds is 44,1475 hours !!!

Testing has been done on Oracle database Enterprise edition 11.2.0.4 running on HPUX 11.31. The query in itself fit on a A4 page but has no particular interest to be displayed here…

Optimizer differences

I executed myself the query using BO universe account with statistics_level=all, set autotrace traceonly explain statistics and with:

SQL> SET lines 200 pages 1000
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('a4v04fdrzt14s','0','all allstats'))

We have already seen multiple times how to generate below explain plan :

---------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                    | E-ROWS |E-Bytes|E-Temp | COST (%CPU)| E-TIME   | Pstart| Pstop |   OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                         |        |       |       |    14M(100)|          |       |       |        |       |          |
|   1 |  HASH GROUP BY                 |                         |    299 | 91793 |  2479M|    14M  (1)| 43:21:37 |       |       |    274M|    15M|     1/0/0|
|*  2 |   FILTER                       |                         |        |       |       |            |          |       |       |        |       |          |
|*  3 |    HASH JOIN                   |                         |   6981K|  2043M|  7300K|   369K  (2)| 01:07:50 |       |       |     10M|  2310K|     1/0/0|
|   4 |     TABLE ACCESS BY INDEX ROWID| AFINISHED_GOOD          |  51906 |  6691K|       |  9728   (1)| 00:01:48 |       |       |        |       |          |
|*  5 |      INDEX RANGE SCAN          | AFINISHED_GOOD_IDX_PGRP |  51906 |       |       |    52   (2)| 00:00:01 |       |       |        |       |          |
|*  6 |     HASH JOIN                  |                         |   7067K|  1179M|    50M|   291K  (2)| 00:53:29 |       |       |     75M|  5505K|     1/0/0|
|   7 |      TABLE ACCESS FULL         | ACUSTOMER_SHIP_TO       |    772K|    41M|       | 42001   (1)| 00:07:43 |       |       |        |       |          |
|   8 |      PARTITION RANGE ALL       |                         |   7022K|   796M|       |   199K  (2)| 00:36:35 |     1 |    12 |        |       |          |
|*  9 |       TABLE ACCESS FULL        | HINVOICE                |   7022K|   796M|       |   199K  (2)| 00:36:35 |     1 |    12 |        |       |          |
|* 10 |    TABLE ACCESS BY INDEX ROWID | USER_X_ORG              |      1 |    55 |       |     2   (0)| 00:00:01 |       |       |        |       |          |
|* 11 |     INDEX RANGE SCAN           | USER_X_ORG_IDX_NAME     |      2 |       |       |     1   (0)| 00:00:01 |       |       |        |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------

I have also noted the warning at the bottom of explain plan:

Note
-----
   - cpu costing IS off (consider enabling it)

So yes 11g optimizer is estimating the query to run for close to 44 hours so the expected restriction of the Database Resource Manager (DBRM)…

The explain plan in 9i is:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                | Starts | E-ROWS |E-Bytes|E-Temp | COST  | Pstart| Pstop | A-ROWS |   A-TIME   | Buffers | Reads  | Writes |  OMem |  1Mem |  O/1/M   | Max-Tmp |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                     |      2 |        |       |       |   209K|       |       |  71574 |00:04:16.16 |    4658K|    574K|    126K|       |       |          |         |
|   1 |  SORT GROUP BY                |                     |      2 |    349K|   102M|   247M|   209K|       |       |  71574 |00:04:16.16 |    4658K|    574K|    126K|    12M|  4667K|     2/0/0|         |
|*  2 |   FILTER                      |                     |      2 |        |       |       |       |       |       |   1001K|00:02:30.12 |     651K|    574K|    126K|       |       |          |         |
|*  3 |    HASH JOIN                  |                     |      2 |    349K|   102M|  7300K|   177K|       |       |   1001K|00:02:27.57 |     472K|    574K|    126K|    10M|  2310K|     2/0/0|         |
|*  4 |     TABLE ACCESS FULL         | AFINISHED_GOOD      |      2 |  51906 |  6691K|       |  9366 |       |       |    112K|00:00:08.36 |   49272 |  24624 |      0 |       |       |          |         |
|*  5 |     HASH JOIN                 |                     |      2 |    353K|    58M|    43M|   166K|       |       |   5357K|00:02:12.99 |     423K|    550K|    126K|   437M|    18M|          |     355K|
|   6 |      PARTITION RANGE ALL      |                     |      2 |    351K|    39M|       |   136K|     1 |    12 |   5357K|00:01:00.00 |     271K|    271K|      0 |       |       |          |         |
|*  7 |       TABLE ACCESS FULL       | HINVOICE            |     24 |    351K|    39M|       |   136K|     1 |    12 |   5357K|00:00:57.53 |     271K|    271K|      0 |       |       |          |         |
|   8 |      TABLE ACCESS FULL        | ACUSTOMER_SHIP_TO   |      2 |    772K|    41M|       | 28887 |       |       |   1545K|00:00:38.49 |     152K|    152K|      0 |       |       |          |         |
|*  9 |    TABLE ACCESS BY INDEX ROWID| USER_X_ORG          |  89393 |      1 |    55 |       |     3 |       |       |  89393 |00:00:00.98 |     178K|      1 |      0 |       |       |          |         |
|* 10 |     INDEX RANGE SCAN          | USER_X_ORG_IDX_NAME |  89393 |      4 |       |       |     1 |       |       |  89393 |00:00:00.56 |   89403 |      0 |      0 |       |       |          |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Note the SORT GROUP BY instead of HASH GROUP BY that is also sorting the result. This to say that if sorting is important then query should also be rewritten in 11g…

We also note completely different cost in 9i (estimated time is also not displayed).

I have also tried to execute the query with my own account (I’m in SYS_GROUP) and got below result:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                    | Starts | E-ROWS |E-Bytes|E-Temp | COST (%CPU)| E-TIME   | Pstart| Pstop | A-ROWS |   A-TIME   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                         |      1 |        |       |       |    14M(100)|          |       |       |  92337 |00:04:01.93 |    7137K|    439K|       |       |          |
|   1 |  HASH GROUP BY                 |                         |      1 |    300 |   105K|  2747M|    14M  (3)| 43:21:37 |       |       |  92337 |00:04:01.93 |    7137K|    439K|   741M|    15M|     1/0/0|
|*  2 |   FILTER                       |                         |      1 |        |       |       |            |          |       |       |   1348K|00:01:32.92 |     910K|    439K|       |       |          |
|*  3 |    HASH JOIN                   |                         |      1 |   7034K|  2421M|  7884K|   444K (15)| 01:07:50 |       |       |   1348K|00:01:28.31 |     455K|    439K|    10M|  2310K|     1/0/0|
|   4 |     TABLE ACCESS BY INDEX ROWID| AFINISHED_GOOD          |      1 |  48035 |  7317K|       |  9249   (3)| 00:00:06 |       |       |  56160 |00:00:00.32 |   16300 |      0 |       |       |          |
|*  5 |      INDEX RANGE SCAN          | AFINISHED_GOOD_IDX_PGRP |      1 |  48035 |       |       |    62  (25)| 00:00:01 |       |       |  56160 |00:00:00.04 |      56 |      0 |       |       |          |
|*  6 |     HASH JOIN                  |                         |      1 |   7120K|  1391M|    72M|   351K (17)| 00:53:29 |       |       |   7120K|00:01:19.96 |     439K|    439K|    75M|  5505K|     1/0/0|
|   7 |      TABLE ACCESS FULL         | ACUSTOMER_SHIP_TO       |      1 |    773K|    63M|       | 46906  (12)| 00:07:43 |       |       |    772K|00:00:09.96 |   76121 |  76115 |       |       |          |
|   8 |      PARTITION RANGE ALL       |                         |      1 |   7120K|   808M|       |   249K (21)| 00:36:35 |     1 |    24 |   7120K|00:00:56.51 |     363K|    363K|       |       |          |
|*  9 |       TABLE ACCESS FULL        | HINVOICE                |     24 |   7120K|   808M|       |   249K (21)| 00:36:35 |     1 |    24 |   7120K|00:00:53.13 |     363K|    363K|       |       |          |
|* 10 |    TABLE ACCESS BY INDEX ROWID | USER_X_ORG              |    227K|      1 |    55 |       |     2   (0)| 00:00:01 |       |       |    227K|00:00:02.18 |     455K|      0 |       |       |          |
|* 11 |     INDEX RANGE SCAN           | USER_X_ORG_IDX_NAME     |    227K|      2 |       |       |     1   (0)| 00:00:01 |       |       |    227K|00:00:01.17 |     227K|      0 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Estimated time is completely wrong (E-Time) while real execution time (A-Time) is even faster than with 9i optimizer (my account is also not restricted on parallel processing).

System statistics

My investigation on cpu costing is off (consider enabling it) have driven me to system statistics (funnily I was working on this for a problem on another database) and discovered we have not updated them since at least Oracle 9i:

SQL> SET lines 200 pages 1000
SQL> col PVAL2 FOR a40
SQL> SELECT * FROM SYS.AUX_STATS$;
 
SNAME                          PNAME                               PVAL1 PVAL2
------------------------------ ------------------------------ ---------- ----------------------------------------
SYSSTATS_INFO                  STATUS                                    COMPLETED
SYSSTATS_INFO                  DSTART                                    08-06-2009 11:24
SYSSTATS_INFO                  DSTOP                                     08-06-2009 11:24
SYSSTATS_INFO                  FLAGS                                   1
SYSSTATS_MAIN                  CPUSPEEDNW                     1020.94011
SYSSTATS_MAIN                  IOSEEKTIM                              10
SYSSTATS_MAIN                  IOTFRSPEED                           4096
SYSSTATS_MAIN                  SREADTIM
SYSSTATS_MAIN                  MREADTIM
SYSSTATS_MAIN                  CPUSPEED
SYSSTATS_MAIN                  MBRC
SYSSTATS_MAIN                  MAXTHR
SYSSTATS_MAIN                  SLAVETHR
 
13 ROWS selected.

I started by exporting them in a table just to be able to recover the situation in case of:

SQL> EXEC dbms_stats.create_stat_table('yjaquier','system_stats01');
 
PL/SQL PROCEDURE successfully completed.
 
SQL> EXEC dbms_stats.export_system_stats('system_stats01','original_system_stats','yjaquier');
 
PL/SQL PROCEDURE successfully completed.

In case of any issue the system statistics can be re-imported back with DBMS_STATS.IMPORT_SYSTEM_STATS procedure.

To gather system statistics the procedure to use is DBMS_STATS.GATHER_SYSTEM_STATS. The NOWORKLOAD gathering mode option did not set the SREADTIM and MREADTIM parameters. I initially thought INTERVAL gathering mode option too but after further test it was related to no activity onto my database. Looking at official documentation it looks some parameters are set only with NOWORKLOAD gathering mode option. Having activity onto your database when you gather system statistics is a key point or Oracle will not able to set system parameters related to I/O for example…

I finally decided to use START/STOP gathering mode option, and I let it run for 2 hours. As we have seen above the table to check is SYS.AUX_STATS$:

SQL> EXEC DBMS_STATS.GATHER_SYSTEM_STATS('START');
 
PL/SQL PROCEDURE successfully completed.
 
SQL> SELECT * FROM SYS.AUX_STATS$;
 
SNAME                          PNAME                               PVAL1 PVAL2
------------------------------ ------------------------------ ---------- ----------------------------------------
SYSSTATS_INFO                  STATUS                                    MANUALGATHERING
SYSSTATS_INFO                  DSTART                                    08-18-2015 15:06
SYSSTATS_INFO                  DSTOP                                     08-18-2015 15:06
SYSSTATS_INFO                  FLAGS                                   1
SYSSTATS_MAIN                  CPUSPEEDNW                     1020.94011
SYSSTATS_TEMP                  SBLKRDS                        4605025788
SYSSTATS_TEMP                  SBLKRDTIM                      4476692903
SYSSTATS_TEMP                  MBLKRDS                        1384364385
SYSSTATS_TEMP                  MBLKRDTIM                       886733794
SYSSTATS_TEMP                  CPUCYCLES                       149179884
SYSSTATS_TEMP                  CPUTIM                          139808514
SYSSTATS_TEMP                  JOB                                     0
SYSSTATS_TEMP                  CACHE_JOB                            3875
SYSSTATS_TEMP                  MBRTOTAL                       9665735981
SYSSTATS_MAIN                  IOSEEKTIM                              10
SYSSTATS_MAIN                  IOTFRSPEED                           4096
SYSSTATS_MAIN                  SREADTIM
SYSSTATS_MAIN                  MREADTIM
SYSSTATS_MAIN                  CPUSPEED
SYSSTATS_MAIN                  MBRC
SYSSTATS_MAIN                  MAXTHR
SYSSTATS_MAIN                  SLAVETHR
 
22 ROWS selected.

Then I confirmed I had a bit of activity onto the database while system statistics gathering was running.

SQL> EXEC DBMS_STATS.GATHER_SYSTEM_STATS('STOP');
 
PL/SQL PROCEDURE successfully completed.
 
SQL> SELECT * FROM SYS.AUX_STATS$;
 
SNAME                          PNAME                               PVAL1 PVAL2
------------------------------ ------------------------------ ---------- ----------------------------------------
SYSSTATS_INFO                  STATUS                                    COMPLETED
SYSSTATS_INFO                  DSTART                                    08-18-2015 15:06
SYSSTATS_INFO                  DSTOP                                     08-18-2015 15:13
SYSSTATS_INFO                  FLAGS                                   1
SYSSTATS_MAIN                  CPUSPEEDNW                     1020.94011
SYSSTATS_MAIN                  IOSEEKTIM                              10
SYSSTATS_MAIN                  IOTFRSPEED                           4096
SYSSTATS_MAIN                  SREADTIM                             .639
SYSSTATS_MAIN                  MREADTIM                             .279
SYSSTATS_MAIN                  CPUSPEED                             1067
SYSSTATS_MAIN                  MBRC                                    2
SYSSTATS_MAIN                  MAXTHR                          173369344
SYSSTATS_MAIN                  SLAVETHR                          8216576
 
13 ROWS selected.

I have cleaned the shared pool from any cursor related to my query with:

SQL> SELECT sql_id,CHILD_NUMBER FROM v$sql WHERE sql_text LIKE 'SELECT /* Yannick%';
 
SQL_ID        CHILD_NUMBER
------------- ------------
a4v04fdrzt14s            0
 
SQL> SELECT ADDRESS, HASH_VALUE,sql_id FROM V$SQLAREA WHERE sql_text LIKE 'SELECT /* Yannick%';
 
ADDRESS          HASH_VALUE SQL_ID
---------------- ---------- -------------
C0000002B8793888 1878819992 a4v04fdrzt14s
 
SQL> EXEC DBMS_SHARED_POOL.PURGE ('C00000033A233A58, 1878819992', 'C');
 
PL/SQL PROCEDURE successfully completed.

Issued the query with the BO universe account limited by database resource manager and finally got a result as Database Resource Manager (DBRM) is now able to better estimate the execution time:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                    | Starts | E-ROWS |E-Bytes|E-Temp | COST (%CPU)| E-TIME   | Pstart| Pstop | A-ROWS |   A-TIME   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                         |      1 |        |       |       |  5212K(100)|          |       |       |  35787 |00:01:46.23 |    2438K|    211K|       |       |          |
|   1 |  HASH GROUP BY                 |                         |      1 |    167 | 51269 |   945M|  5212K  (1)| 10:18:29 |       |       |  35787 |00:01:46.23 |    2438K|    211K|   274M|    15M|     1/0/0|
|*  2 |   FILTER                       |                         |      1 |        |       |       |            |          |       |       |    500K|00:00:55.59 |     434K|    211K|       |       |          |
|*  3 |    HASH JOIN                   |                         |      1 |   2662K|   779M|  7300K|   134K  (2)| 00:16:00 |       |       |    500K|00:00:53.54 |     228K|    211K|    10M|  2310K|     1/0/0|
|   4 |     TABLE ACCESS BY INDEX ROWID| AFINISHED_GOOD          |      1 |  51906 |  6691K|       |  9736   (1)| 00:01:10 |       |       |  56143 |00:00:00.37 |   16322 |      0 |       |       |          |
|*  5 |      INDEX RANGE SCAN          | AFINISHED_GOOD_IDX_PGRP |      1 |  51906 |       |       |    52   (2)| 00:00:01 |       |       |  56143 |00:00:00.03 |      56 |      0 |       |       |          |
|*  6 |     HASH JOIN                  |                         |      1 |   2695K|   449M|    50M|   116K  (2)| 00:13:52 |       |       |   2678K|00:00:49.97 |     211K|    211K|    75M|  5505K|     1/0/0|
|   7 |      TABLE ACCESS FULL         | ACUSTOMER_SHIP_TO       |      1 |    772K|    41M|       | 39313   (2)| 00:04:40 |       |       |    772K|00:00:12.13 |   76106 |  76100 |       |       |          |
|   8 |      PARTITION RANGE ALL       |                         |      1 |   2678K|   303M|       | 71001   (3)| 00:08:26 |     1 |    12 |   2678K|00:00:31.66 |     135K|    135K|       |       |          |
|*  9 |       TABLE ACCESS FULL        | HINVOICE                |     12 |   2678K|   303M|       | 71001   (3)| 00:08:26 |     1 |    12 |   2678K|00:00:30.33 |     135K|    135K|       |       |          |
|* 10 |    TABLE ACCESS BY INDEX ROWID | USER_X_ORG              |    103K|      1 |    55 |       |     2   (0)| 00:00:01 |       |       |    103K|00:00:01.10 |     206K|      1 |       |       |          |
|* 11 |     INDEX RANGE SCAN           | USER_X_ORG_IDX_NAME     |    103K|      2 |       |       |     1   (0)| 00:00:01 |       |       |    103K|00:00:00.59 |     103K|      0 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Remark:
If the figures are a bit different it is because BI team deleted in parallel old years of figures from fact table as they have forgotten. To be honest this is exactly a situation where it makes sense to have database resource manager to forbid big queries from running. And obviously the 12 hours limit that has been set does not help at all…

So clearly the system statistics play a key role in estimating execution time of queries and looking at figures you tell to optimizer when gathering system statistics is make a lot of sense…

References

About Post Author

Share the knowledge!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>