Automatic extended statistics to improve performance

Preamble

Extended statistics is nice 11gR1 feature that is providing extra information to Cost Based Optimizer (CBO). Extended statistics improve cardinality estimates and can be created on:

  • Column group statistics
  • Expression statistics

The typical example for column group statistics is a customer table having state and country columns. If the state is California obviously the country is USA, so extended statistics provide the relationship extra information to CBO. The typical expression statistics example is when you select uppercase on state column to find customer in state ‘CA’ (in uppercase whatever is store in your customer table, California)

All is very well explained in Oracle Database Performance Tuning Guide and on the paper it is a nice feature… Nice feature if you really know your data model and can estimate which expressions or groups of columns could benefit of it. In real life it could be difficult to achieve this, even more when your data model execute only ad hoc queries coming from a BO universe for example. How could we automate this process and have automatic extended statistics ? The answer is provided by below two function/procedure of DBMS_STATS package:

  • SEED_COL_USAGE Procedure
  • REPORT_COL_USAGE Function

Automatic extended statistics preparation

Start by creating a SQL Tuning Set (STS) where you will store the cursors you are planning to analyze:

SQL> EXEC DBMS_SQLTUNE.CREATE_SQLSET(sqlset_name => 'STS01', description => 'STS for whole cursor cache');
 
PL/SQL PROCEDURE successfully completed.

I have chosen to load the entire shared SQL area so all cursors in shared pool into my STS:

DECLARE
  cursor1 DBMS_SQLTUNE.SQLSET_CURSOR;
BEGIN
  OPEN cursor1 FOR SELECT VALUE(p)
  FROM TABLE(DBMS_SQLTUNE.SELECT_CURSOR_CACHE) p;
 
  DBMS_SQLTUNE.LOAD_SQLSET(sqlset_name => 'STS01', populate_cursor => cursor1);
  CLOSE cursor1;
END;
/
 
PL/SQL PROCEDURE successfully completed.

Then submit the analysis of columns usage for your STS with DBMS_STATS.SEED_COL_USAGE procedure (third parameter is execution limit, null means unlimited):

SQL> EXEC DBMS_STATS.SEED_COL_USAGE('STS01','YJAQUIER',NULL);
 
PL/SQL PROCEDURE successfully completed.

Report columns usage with DBMS_STATS.REPORT_COL_USAGE, for convenience it is better to extract the report in a text file that can be opened by a modern text editor to be able to do searches:

SQL> SET linesize 150 LONG 9999999 pages 1000 longchunksize 9999999
SQL> spool /tmp/report_col_usage.txt
SQL> SELECT DBMS_STATS.REPORT_COL_USAGE(NULL,NULL) FROM dual;
SQL> spool off

Search for ” JOIN ” or “spaceJOINspace” in generated report and identify couple of columns like this:

###############################################################################                                                                       
 
COLUMN USAGE REPORT FOR SA_WREF.EXCHANGE_RATE                                                                                                         
.............................................                                                                                                         
 
1. CODE_IN                             : EQ EQ_JOIN                                                                                                   
2. CODE_OUT                            : EQ EQ_JOIN                                                                                                   
3. MONTH__CODE                         : EQ RANGE LIKE EQ_JOIN                                                                                        
4. (CODE_IN, CODE_OUT)                 : FILTER                                                                                                       
5. (CODE_OUT, MONTH__CODE)             : JOIN                                                                                                         
###############################################################################

In above extract we see that CODE_IN and CODE_OUT columns are used in filter and join is one or many SQL statement…

Generate a specific report on a table with:

SQL> SELECT DBMS_STATS.REPORT_COL_USAGE('SA_WREF','EXCHANGE_RATE') FROM dual;
 
DBMS_STATS.REPORT_COL_USAGE('SA_WREF','EXCHANGE_RATE')
------------------------------------------------------------------------------------------------------------------------------------------------------
LEGEND:
.......
 
EQ         : Used IN single TABLE EQuality predicate
RANGE      : Used IN single TABLE RANGE predicate
LIKE       : Used IN single TABLE LIKE predicate
NULL       : Used IN single TABLE IS (NOT) NULL predicate
EQ_JOIN    : Used IN EQuality JOIN predicate
NONEQ_JOIN : Used IN NON EQuality JOIN predicate
FILTER     : Used IN single TABLE FILTER predicate
JOIN       : Used IN JOIN predicate
GROUP_BY   : Used IN GROUP BY expression
...............................................................................
 
###############################################################################
 
COLUMN USAGE REPORT FOR SA_WREF.EXCHANGE_RATE
.............................................
 
1. CODE_IN                             : EQ EQ_JOIN
2. CODE_OUT                            : EQ EQ_JOIN
3. MONTH__CODE                         : EQ RANGE LIKE EQ_JOIN
4. SYS_STUVC378K1TBWH6VF9UD0NWG90      : EQ
5. (CODE_IN, CODE_OUT)                 : FILTER
6. (CODE_OUT, MONTH__CODE)             : JOIN
###############################################################################

Then search for sql_id in V$SQL with something like (I did not suffix the final account name because my database is massively using synonym inside execution schemas):

SELECT sql_id,child_number,sql_fulltext FROM v$sql WHERE UPPER(SQL_TEXT) LIKE '%EXCHANGE_RATE%';

When you want to generate automatic extended statistics use:

SQL> SELECT dbms_stats.create_extended_stats('SA_WREF','EXCHANGE_RATE') FROM dual;
###############################################################################
 
EXTENSIONS FOR SA_WREF.EXCHANGE_RATE
....................................
 
1. (CODE_IN, CODE_OUT)                 : SYS_STUVC378K1TBWH6VF9UD0NWG90 created
2. (CODE_OUT, MONTH__CODE)             : SYS_STUMG#R$HG_3$6LAPL_ZPVH2FF created
###############################################################################

You can control it has been done with:

SQL> col extension FOR a30
SQL> SELECT * FROM dba_STAT_EXTENSIONS
     WHERE owner='SA_WREF'
     AND table_name='EXCHANGE_RATE'
     ORDER BY 1,2,3;
 
OWNER                          TABLE_NAME                     EXTENSION_NAME                 EXTENSION                      CREATO DRO
------------------------------ ------------------------------ ------------------------------ ------------------------------ ------ ---
SA_WREF                        EXCHANGE_RATE                  SYS_STUMG#R$HG_3$6LAPL_ZPVH2FF ("CODE_OUT","MONTH__CODE")     USER   YES
SA_WREF                        EXCHANGE_RATE                  SYS_STUVC378K1TBWH6VF9UD0NWG90 ("CODE_IN","CODE_OUT")         USER   YES
 
SQL> SELECT DBMS_STATS.SHOW_EXTENDED_STATS_NAME('SA_WREF','EXCHANGE_RATE','("CODE_OUT","MONTH__CODE")') FROM dual;
 
DBMS_STATS.SHOW_EXTENDED_STATS_NAME('SA_WREF','EXCHANGE_RATE','("CODE_OUT","MONTH__CODE")')
------------------------------------------------------------------------------------------------------------------------------------------------------
SYS_STUMG#R$HG_3$6LAPL_ZPVH2FF
 
SQL> SELECT DBMS_STATS.SHOW_EXTENDED_STATS_NAME('SA_WREF','EXCHANGE_RATE','("CODE_IN","CODE_OUT")') FROM dual;
 
DBMS_STATS.SHOW_EXTENDED_STATS_NAME('SA_WREF','EXCHANGE_RATE','("CODE_IN","CODE_OUT")')
------------------------------------------------------------------------------------------------------------------------------------------------------
SYS_STUVC378K1TBWH6VF9UD0NWG90

But as we can see the table’s statistics needs to be gathered again:

SQL> ALTER SESSION SET nls_date_format='dd-mon-yyyy hh24:mi:ss';
 
SESSION altered.
 
SQL> SELECT column_name,last_analyzed,histogram
FROM dba_tab_col_statistics
WHERE owner='SA_WREF'
AND table_name='EXCHANGE_RATE'
ORDER BY 1,2,3;
 
COLUMN_NAME                    LAST_ANALYZED        HISTOGRAM
------------------------------ -------------------- ---------------
CODE_IN                        05-jun-2015 22:40:38 FREQUENCY
CODE_OUT                       05-jun-2015 22:40:38 FREQUENCY
MONTH__CODE                    05-jun-2015 22:40:38 FREQUENCY
VALUE                          05-jun-2015 22:40:38 NONE

The gather for extended statistics is not using the classical method (notice the SKEWONLY keyword):

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'SA_WREF',tabname=>'EXCHANGE_RATE',method_opt=>'for all columns size skewonly',CASCADE=>TRUE);
 
PL/SQL PROCEDURE successfully completed.

After statistics gathering:

SQL> SELECT column_name,last_analyzed,histogram
FROM dba_tab_col_statistics
WHERE owner='SA_WREF'
AND table_name='EXCHANGE_RATE'
ORDER BY 1,2,3;
 
COLUMN_NAME                    LAST_ANAL HISTOGRAM
------------------------------ --------- ---------------
CODE_IN                        19-JUN-15 FREQUENCY
CODE_OUT                       19-JUN-15 FREQUENCY
MONTH__CODE                    19-JUN-15 FREQUENCY
SYS_STUMG#R$HG_3$6LAPL_ZPVH2FF 19-JUN-15 NONE
SYS_STUVC378K1TBWH6VF9UD0NWG90 19-JUN-15 NONE
VALUE                          19-JUN-15 HEIGHT BALANCED
 
6 ROWS selected.

Automatic extended statistics testing

The query using EXCHANGE_RATE table is the following, I have customized it to add a comment to easier find it in V$SQL:

SELECT /* Yannick */ NVL(CONCAT(CONCAT(SUBSTR(MARSDBA.DIMX_TIME.QUARTER__CODE,4,4),
SUBSTR(MARSDBA.DIMX_TIME.QUARTER_DESCR,3,1)),
SUBSTR(MARSDBA.DIMX_TIME.QUARTER__CODE,1,2)),'REST'),
NVL(MARSDBA.DIMX_TIME.MONTH__CODE,'REST'),
DWHDBA.WCURRENCY.DESCR,
DWHDBA.WCURRENCY.CODE,    
DWHDBA.EXCHANGE_RATE.VALUE
FROM MARSDBA.DIMX_TIME,DWHDBA.WCURRENCY,DWHDBA.EXCHANGE_RATE
WHERE (MARSDBA.DIMX_TIME.MONTH__CODE=DWHDBA.EXCHANGE_RATE.MONTH__CODE)
AND (DWHDBA.EXCHANGE_RATE.CODE_OUT=DWHDBA.WCURRENCY.CODE)
AND (DWHDBA.EXCHANGE_RATE.CODE_IN = 02)
AND (DWHDBA.WCURRENCY.CODE = '50'
AND (NVL(MARSDBA.DIMX_TIME.MONTH__CODE,'REST') IN (' ')));

I have got the opportunity to execute it, while doing this I have set below option for my session:

ALTER SESSION SET statistics_level=ALL;
SET AUTOTRACE TRACEONLY EXPLAIN STATISTICS
SET lines 150 pages 1000

Display the plan with:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('<sql_id>',<child_number>,'ALL ALLSTATS'));

Before the extended statistics:

------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                | Starts | E-ROWS |E-Bytes| COST (%CPU)| E-TIME   | A-ROWS |   A-TIME   | Buffers |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                     |      4 |        |       |    16 (100)|          |      0 |00:00:00.01 |       4 |
|   1 |  NESTED LOOPS                  |                     |      4 |     40 |  3040 |    16   (0)| 00:00:01 |      0 |00:00:00.01 |       4 |
|   2 |   NESTED LOOPS                 |                     |      4 |   2014 |  3040 |    16   (0)| 00:00:01 |      0 |00:00:00.01 |       4 |
|   3 |    NESTED LOOPS                |                     |      4 |      1 |    41 |     5   (0)| 00:00:01 |      0 |00:00:00.01 |       4 |
|   4 |     TABLE ACCESS BY INDEX ROWID| DIMX_TIME           |      4 |      1 |    22 |     1   (0)| 00:00:01 |      0 |00:00:00.01 |       4 |
|*  5 |      INDEX UNIQUE SCAN         | DIMX_TIME_IDX_MM    |      4 |      1 |       |     0   (0)|          |      0 |00:00:00.01 |       4 |
|*  6 |     TABLE ACCESS FULL          | WCURRENCY           |      0 |      1 |    19 |     4   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|*  7 |    INDEX RANGE SCAN            | EXCH_RATE_IDX_MONTH |      0 |   2014 |       |     2   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|*  8 |   TABLE ACCESS BY INDEX ROWID  | EXCHANGE_RATE       |      0 |     40 |  1400 |    11   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
------------------------------------------------------------------------------------------------------------------------------------------------

After gathering of extended statistics:

------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                | Starts | E-ROWS |E-Bytes| COST (%CPU)| E-TIME   | A-ROWS |   A-TIME   | Buffers |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                     |      1 |        |       |    16 (100)|          |      0 |00:00:00.01 |       1 |
|   1 |  NESTED LOOPS                  |                     |      1 |     39 |  2964 |    16   (0)| 00:00:01 |      0 |00:00:00.01 |       1 |
|   2 |   NESTED LOOPS                 |                     |      1 |   2014 |  2964 |    16   (0)| 00:00:01 |      0 |00:00:00.01 |       1 |
|   3 |    NESTED LOOPS                |                     |      1 |      1 |    41 |     5   (0)| 00:00:01 |      0 |00:00:00.01 |       1 |
|   4 |     TABLE ACCESS BY INDEX ROWID| DIMX_TIME           |      1 |      1 |    22 |     1   (0)| 00:00:01 |      0 |00:00:00.01 |       1 |
|*  5 |      INDEX UNIQUE SCAN         | DIMX_TIME_IDX_MM    |      1 |      1 |       |     0   (0)|          |      0 |00:00:00.01 |       1 |
|*  6 |     TABLE ACCESS FULL          | WCURRENCY           |      0 |      1 |    19 |     4   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|*  7 |    INDEX RANGE SCAN            | EXCH_RATE_IDX_MONTH |      0 |   2014 |       |     2   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|*  8 |   TABLE ACCESS BY INDEX ROWID  | EXCHANGE_RATE       |      0 |     39 |  1365 |    11   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
------------------------------------------------------------------------------------------------------------------------------------------------

Not much difference that is only explained by the different gathering method, SKEWONLY instead of AUTO. This to say that even if Oracle is reporting all the columns association where you could benefit from extended statistics it DOES NOT mean that creating them will improve the situation. Concretely in real life on my BI database I have not been able to find any SQL that demonstrate a cardinality estimate improvement after extended statistics.

Maybe here I’m hitting one of extended statistics limitations:

Extended statistics can only be used when the where the clause predicates are equalities or in-lists.
Extended statistics will not be used if there are histograms present on the underlying columns and there is no histogram present on the column group.

Overall the automatic extended statistics procedure/function do the job and what is promised is achieve, up to you to test them in concrete scenario and see if it is changing the explain plan or not. For sure if you know by heart your data model you would not need this method to understand where to create those extended statistics.

To clean the created extended statistics:

SQL> EXEC DBMS_STATS.DROP_EXTENDED_STATS('SA_WREF','EXCHANGE_RATE','("CODE_OUT","MONTH__CODE")');
 
PL/SQL PROCEDURE successfully completed.
 
SQL> EXEC DBMS_STATS.DROP_EXTENDED_STATS('SA_WREF','EXCHANGE_RATE','("CODE_IN","CODE_OUT")');
 
PL/SQL PROCEDURE successfully completed.

References

This entry was posted in Oracle and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>