Table of contents
Preamble
Extended statistics is nice 11gR1 feature that is providing extra information to Cost Based Optimizer (CBO). Extended statistics improve cardinality estimates and can be created on:
- Column group statistics
- Expression statistics
The typical example for column group statistics is a customer table having state and country columns. If the state is California obviously the country is USA, so extended statistics provide the relationship extra information to CBO. The typical expression statistics example is when you select uppercase on state column to find customer in state ‘CA’ (in uppercase whatever is store in your customer table, California)
All is very well explained in Oracle Database Performance Tuning Guide and on the paper it is a nice feature… Nice feature if you really know your data model and can estimate which expressions or groups of columns could benefit of it. In real life it could be difficult to achieve this, even more when your data model execute only ad hoc queries coming from a BO universe for example. How could we automate this process and have automatic extended statistics ? The answer is provided by below two function/procedure of DBMS_STATS package:
- SEED_COL_USAGE Procedure
- REPORT_COL_USAGE Function
Automatic extended statistics preparation
Start by creating a SQL Tuning Set (STS) where you will store the cursors you are planning to analyze:
SQL> EXEC DBMS_SQLTUNE.CREATE_SQLSET(sqlset_name => 'STS01', description => 'STS for whole cursor cache'); PL/SQL PROCEDURE successfully completed. |
I have chosen to load the entire shared SQL area so all cursors in shared pool into my STS:
DECLARE cursor1 DBMS_SQLTUNE.SQLSET_CURSOR; BEGIN OPEN cursor1 FOR SELECT VALUE(p) FROM TABLE(DBMS_SQLTUNE.SELECT_CURSOR_CACHE) p; DBMS_SQLTUNE.LOAD_SQLSET(sqlset_name => 'STS01', populate_cursor => cursor1); CLOSE cursor1; END; / PL/SQL PROCEDURE successfully completed. |
Then submit the analysis of columns usage for your STS with DBMS_STATS.SEED_COL_USAGE procedure (third parameter is execution limit, null means unlimited):
SQL> EXEC DBMS_STATS.SEED_COL_USAGE('STS01','YJAQUIER',NULL); PL/SQL PROCEDURE successfully completed. |
Report columns usage with DBMS_STATS.REPORT_COL_USAGE, for convenience it is better to extract the report in a text file that can be opened by a modern text editor to be able to do searches:
SQL> SET linesize 150 LONG 9999999 pages 1000 longchunksize 9999999 SQL> spool /tmp/report_col_usage.txt SQL> SELECT DBMS_STATS.REPORT_COL_USAGE(NULL,NULL) FROM dual; SQL> spool off |
Search for ” JOIN ” or “spaceJOINspace” in generated report and identify couple of columns like this:
############################################################################### COLUMN USAGE REPORT FOR SA_WREF.EXCHANGE_RATE ............................................. 1. CODE_IN : EQ EQ_JOIN 2. CODE_OUT : EQ EQ_JOIN 3. MONTH__CODE : EQ RANGE LIKE EQ_JOIN 4. (CODE_IN, CODE_OUT) : FILTER 5. (CODE_OUT, MONTH__CODE) : JOIN ############################################################################### |
In above extract we see that CODE_IN and CODE_OUT columns are used in filter and join is one or many SQL statement…
Generate a specific report on a table with:
SQL> SELECT DBMS_STATS.REPORT_COL_USAGE('SA_WREF','EXCHANGE_RATE') FROM dual; DBMS_STATS.REPORT_COL_USAGE('SA_WREF','EXCHANGE_RATE') ------------------------------------------------------------------------------------------------------------------------------------------------------ LEGEND: ....... EQ : Used IN single TABLE EQuality predicate RANGE : Used IN single TABLE RANGE predicate LIKE : Used IN single TABLE LIKE predicate NULL : Used IN single TABLE IS (NOT) NULL predicate EQ_JOIN : Used IN EQuality JOIN predicate NONEQ_JOIN : Used IN NON EQuality JOIN predicate FILTER : Used IN single TABLE FILTER predicate JOIN : Used IN JOIN predicate GROUP_BY : Used IN GROUP BY expression ............................................................................... ############################################################################### COLUMN USAGE REPORT FOR SA_WREF.EXCHANGE_RATE ............................................. 1. CODE_IN : EQ EQ_JOIN 2. CODE_OUT : EQ EQ_JOIN 3. MONTH__CODE : EQ RANGE LIKE EQ_JOIN 4. SYS_STUVC378K1TBWH6VF9UD0NWG90 : EQ 5. (CODE_IN, CODE_OUT) : FILTER 6. (CODE_OUT, MONTH__CODE) : JOIN ############################################################################### |
Then search for sql_id in V$SQL with something like (I did not suffix the final account name because my database is massively using synonym inside execution schemas):
SELECT sql_id,child_number,sql_fulltext FROM v$sql WHERE UPPER(SQL_TEXT) LIKE '%EXCHANGE_RATE%'; |
When you want to generate automatic extended statistics use:
SQL> SELECT dbms_stats.create_extended_stats('SA_WREF','EXCHANGE_RATE') FROM dual; ############################################################################### EXTENSIONS FOR SA_WREF.EXCHANGE_RATE .................................... 1. (CODE_IN, CODE_OUT) : SYS_STUVC378K1TBWH6VF9UD0NWG90 created 2. (CODE_OUT, MONTH__CODE) : SYS_STUMG#R$HG_3$6LAPL_ZPVH2FF created ############################################################################### |
You can control it has been done with:
SQL> col extension FOR a30 SQL> SELECT * FROM dba_STAT_EXTENSIONS WHERE owner='SA_WREF' AND table_name='EXCHANGE_RATE' ORDER BY 1,2,3; OWNER TABLE_NAME EXTENSION_NAME EXTENSION CREATO DRO ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------ --- SA_WREF EXCHANGE_RATE SYS_STUMG#R$HG_3$6LAPL_ZPVH2FF ("CODE_OUT","MONTH__CODE") USER YES SA_WREF EXCHANGE_RATE SYS_STUVC378K1TBWH6VF9UD0NWG90 ("CODE_IN","CODE_OUT") USER YES SQL> SELECT DBMS_STATS.SHOW_EXTENDED_STATS_NAME('SA_WREF','EXCHANGE_RATE','("CODE_OUT","MONTH__CODE")') FROM dual; DBMS_STATS.SHOW_EXTENDED_STATS_NAME('SA_WREF','EXCHANGE_RATE','("CODE_OUT","MONTH__CODE")') ------------------------------------------------------------------------------------------------------------------------------------------------------ SYS_STUMG#R$HG_3$6LAPL_ZPVH2FF SQL> SELECT DBMS_STATS.SHOW_EXTENDED_STATS_NAME('SA_WREF','EXCHANGE_RATE','("CODE_IN","CODE_OUT")') FROM dual; DBMS_STATS.SHOW_EXTENDED_STATS_NAME('SA_WREF','EXCHANGE_RATE','("CODE_IN","CODE_OUT")') ------------------------------------------------------------------------------------------------------------------------------------------------------ SYS_STUVC378K1TBWH6VF9UD0NWG90 |
But as we can see the table’s statistics needs to be gathered again:
SQL> ALTER SESSION SET nls_date_format='dd-mon-yyyy hh24:mi:ss'; SESSION altered. SQL> SELECT column_name,last_analyzed,histogram FROM dba_tab_col_statistics WHERE owner='SA_WREF' AND table_name='EXCHANGE_RATE' ORDER BY 1,2,3; COLUMN_NAME LAST_ANALYZED HISTOGRAM ------------------------------ -------------------- --------------- CODE_IN 05-jun-2015 22:40:38 FREQUENCY CODE_OUT 05-jun-2015 22:40:38 FREQUENCY MONTH__CODE 05-jun-2015 22:40:38 FREQUENCY VALUE 05-jun-2015 22:40:38 NONE |
The gather for extended statistics is not using the classical method (notice the SKEWONLY keyword):
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'SA_WREF',tabname=>'EXCHANGE_RATE',method_opt=>'for all columns size skewonly',CASCADE=>TRUE); PL/SQL PROCEDURE successfully completed. |
After statistics gathering:
SQL> SELECT column_name,last_analyzed,histogram FROM dba_tab_col_statistics WHERE owner='SA_WREF' AND table_name='EXCHANGE_RATE' ORDER BY 1,2,3; COLUMN_NAME LAST_ANAL HISTOGRAM ------------------------------ --------- --------------- CODE_IN 19-JUN-15 FREQUENCY CODE_OUT 19-JUN-15 FREQUENCY MONTH__CODE 19-JUN-15 FREQUENCY SYS_STUMG#R$HG_3$6LAPL_ZPVH2FF 19-JUN-15 NONE SYS_STUVC378K1TBWH6VF9UD0NWG90 19-JUN-15 NONE VALUE 19-JUN-15 HEIGHT BALANCED 6 ROWS selected. |
Automatic extended statistics testing
The query using EXCHANGE_RATE table is the following, I have customized it to add a comment to easier find it in V$SQL:
SELECT /* Yannick */ NVL(CONCAT(CONCAT(SUBSTR(MARSDBA.DIMX_TIME.QUARTER__CODE,4,4), SUBSTR(MARSDBA.DIMX_TIME.QUARTER_DESCR,3,1)), SUBSTR(MARSDBA.DIMX_TIME.QUARTER__CODE,1,2)),'REST'), NVL(MARSDBA.DIMX_TIME.MONTH__CODE,'REST'), DWHDBA.WCURRENCY.DESCR, DWHDBA.WCURRENCY.CODE, DWHDBA.EXCHANGE_RATE.VALUE FROM MARSDBA.DIMX_TIME,DWHDBA.WCURRENCY,DWHDBA.EXCHANGE_RATE WHERE (MARSDBA.DIMX_TIME.MONTH__CODE=DWHDBA.EXCHANGE_RATE.MONTH__CODE) AND (DWHDBA.EXCHANGE_RATE.CODE_OUT=DWHDBA.WCURRENCY.CODE) AND (DWHDBA.EXCHANGE_RATE.CODE_IN = 02) AND (DWHDBA.WCURRENCY.CODE = '50' AND (NVL(MARSDBA.DIMX_TIME.MONTH__CODE,'REST') IN (' '))); |
I have got the opportunity to execute it, while doing this I have set below option for my session:
ALTER SESSION SET statistics_level=ALL; SET AUTOTRACE TRACEONLY EXPLAIN STATISTICS SET lines 150 pages 1000 |
Display the plan with:
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('<sql_id>',<child_number>,'ALL ALLSTATS')); |
Before the extended statistics:
------------------------------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-ROWS |E-Bytes| COST (%CPU)| E-TIME | A-ROWS | A-TIME | Buffers | ------------------------------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 4 | | | 16 (100)| | 0 |00:00:00.01 | 4 | | 1 | NESTED LOOPS | | 4 | 40 | 3040 | 16 (0)| 00:00:01 | 0 |00:00:00.01 | 4 | | 2 | NESTED LOOPS | | 4 | 2014 | 3040 | 16 (0)| 00:00:01 | 0 |00:00:00.01 | 4 | | 3 | NESTED LOOPS | | 4 | 1 | 41 | 5 (0)| 00:00:01 | 0 |00:00:00.01 | 4 | | 4 | TABLE ACCESS BY INDEX ROWID| DIMX_TIME | 4 | 1 | 22 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 4 | |* 5 | INDEX UNIQUE SCAN | DIMX_TIME_IDX_MM | 4 | 1 | | 0 (0)| | 0 |00:00:00.01 | 4 | |* 6 | TABLE ACCESS FULL | WCURRENCY | 0 | 1 | 19 | 4 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | |* 7 | INDEX RANGE SCAN | EXCH_RATE_IDX_MONTH | 0 | 2014 | | 2 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | |* 8 | TABLE ACCESS BY INDEX ROWID | EXCHANGE_RATE | 0 | 40 | 1400 | 11 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | ------------------------------------------------------------------------------------------------------------------------------------------------ |
After gathering of extended statistics:
------------------------------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-ROWS |E-Bytes| COST (%CPU)| E-TIME | A-ROWS | A-TIME | Buffers | ------------------------------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | | 16 (100)| | 0 |00:00:00.01 | 1 | | 1 | NESTED LOOPS | | 1 | 39 | 2964 | 16 (0)| 00:00:01 | 0 |00:00:00.01 | 1 | | 2 | NESTED LOOPS | | 1 | 2014 | 2964 | 16 (0)| 00:00:01 | 0 |00:00:00.01 | 1 | | 3 | NESTED LOOPS | | 1 | 1 | 41 | 5 (0)| 00:00:01 | 0 |00:00:00.01 | 1 | | 4 | TABLE ACCESS BY INDEX ROWID| DIMX_TIME | 1 | 1 | 22 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 1 | |* 5 | INDEX UNIQUE SCAN | DIMX_TIME_IDX_MM | 1 | 1 | | 0 (0)| | 0 |00:00:00.01 | 1 | |* 6 | TABLE ACCESS FULL | WCURRENCY | 0 | 1 | 19 | 4 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | |* 7 | INDEX RANGE SCAN | EXCH_RATE_IDX_MONTH | 0 | 2014 | | 2 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | |* 8 | TABLE ACCESS BY INDEX ROWID | EXCHANGE_RATE | 0 | 39 | 1365 | 11 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | ------------------------------------------------------------------------------------------------------------------------------------------------ |
Not much difference that is only explained by the different gathering method, SKEWONLY instead of AUTO. This to say that even if Oracle is reporting all the columns association where you could benefit from extended statistics it DOES NOT mean that creating them will improve the situation. Concretely in real life on my BI database I have not been able to find any SQL that demonstrate a cardinality estimate improvement after extended statistics.
Maybe here I’m hitting one of extended statistics limitations:
Extended statistics can only be used when the where the clause predicates are equalities or in-lists.
Extended statistics will not be used if there are histograms present on the underlying columns and there is no histogram present on the column group.
Overall the automatic extended statistics procedure/function do the job and what is promised is achieve, up to you to test them in concrete scenario and see if it is changing the explain plan or not. For sure if you know by heart your data model you would not need this method to understand where to create those extended statistics.
To clean the created extended statistics:
SQL> EXEC DBMS_STATS.DROP_EXTENDED_STATS('SA_WREF','EXCHANGE_RATE','("CODE_OUT","MONTH__CODE")'); PL/SQL PROCEDURE successfully completed. SQL> EXEC DBMS_STATS.DROP_EXTENDED_STATS('SA_WREF','EXCHANGE_RATE','("CODE_IN","CODE_OUT")'); PL/SQL PROCEDURE successfully completed. |
Satvvik Visswanathan says:
Hi Yannick,
In the second execution plan (after gathering extended column stats, the Buffers reduced from 4 to 1.. Don’t you think, that’s an improvement. Possibly when the # of executions is more, the latter might consume less system resources and be more efficient. Please message if you feel otherwise.
Yannick Jaquier says:
Hi Satvvik,
No I don’t think it is extended statistics that has change number of buffer required. I more think this is an additional execution that has tuned it instead…
Yannick.