Cardinality feedback to automatically improve query plans

Preamble

While reading a bit on 12cR1 new features I saw statistics feedback that was formally called cardinality feedback that is available starting with Oracle 11gR2. So before moving to 12cR1 I decided to give the cardinality feedback feature, which I did not know yet, a try !!

Testing has been done using a Oracle Enterprise Edition in 11.2.0.4 running under HP-UX 11iv3 (11.31).

Cardinality feedback configuration

I’m gonna use the below test table:

DROP TABLE test1;
 
CREATE TABLE test1(id1 NUMBER, id2 NUMBER,descr VARCHAR2(100)) NOLOGGING TABLESPACE tools;
 
DECLARE
i NUMBER;
nbrows NUMBER;
BEGIN
  i:=1;
  nbrows:=10000000;
  LOOP
    EXIT WHEN i>nbrows;
    IF (MOD(i,200000) = 0) THEN
      INSERT INTO test1 VALUES(4,ROUND(dbms_random.VALUE(1,nbrows)),RPAD('A',100,'A'));
    ELSE
      INSERT INTO test1 VALUES(MOD(i,4),ROUND(dbms_random.VALUE(1,nbrows)),RPAD('A',100,'A'));
    END IF;
    i:=i+1;
  END LOOP;
  COMMIT;
END;
/

The idea of this test table is the following (the first column can be seen in many blog post as it is a quite well famous method):

  • First column to have a column that contains not too many distinct value and particularly skewed (value 5 occur much less than other values)
  • Second column is a random number to allow few computation if needed
  • Third column is one hundred characters to artificially increase table size and generate high number of block read in case of full table scan (FTS)

Then I create two indexes and gather table statistics with no histograms:

SQL> CREATE INDEX test1_idx_id1 ON test1(id1) TABLESPACE tools;
 
INDEX created.
 
SQL> CREATE INDEX test1_idx_id2 ON test1(id2) TABLESPACE tools;
 
INDEX created.
 
SQL> EXEC dbms_stats.gather_table_stats('yjaquier', 'test1');
 
PL/SQL PROCEDURE successfully completed.

As we have already seen at first execution the gather of statistics will not generate any histograms:

SQL> SELECT object_id FROM dba_objects WHERE owner='YJAQUIER' AND object_name='TEST1';
 
 OBJECT_ID
----------
    109195
 
SQL> SELECT * FROM sys.col_usage$ WHERE obj#=109195;
 
no ROWS selected
 
SQL> col COLUMN_NAME FOR a15
SQL> SELECT table_name,column_name,histogram
     FROM user_tab_columns
     WHERE table_name = 'TEST1'
     ORDER BY column_name;
 
TABLE_NAME                     COLUMN_NAME     HISTOGRAM
------------------------------ --------------- ---------------
TEST1                          DESCR           NONE
TEST1                          ID1             NONE
TEST1                          ID2             NONE

So we get below test table with id1 column distribution:

SQL> SELECT id1, COUNT(*)
FROM test1
GROUP BY id1
ORDER BY id1;
 
       ID1   COUNT(*)
---------- ----------
         0    2499950
         1    2500000
         2    2500000
         3    2500000
         4         50

And the most important thing: as the id1 column is lacking histogram Cost Based Optimizer (CBO) is not aware of column distribution …

Then we have to find a query that would benefit from histograms on id1 column and so benefit from cardinality feedback feature. As, again, because we are lacking good statistics on our test table. I thought it would be easy but at then end it was clearly not piece of cake as all the simple queries that came to my mind were not working or because even if query benefited from cardinality feedback the explain plan did not change (??).

From Oracle official documentation cardinality feedback may be used for below conditions:

  • tables with no statistics
  • multiple conjunctive or disjunctive filter predicates on a table
  • Predicates containing complex operators for which the optimizer cannot accurately compute selectivity estimates

So all below SQL statements are not providing good results (either cardinality feedback not used or used but not changing explain plan):

SQL> SELECT * FROM test1 WHERE id1=5;
SQL> SELECT * FROM test1 WHERE id1=5 AND id2>5000000;

When displaying explain plan with additional information if you get below message:

Warning: basic plan statistics not available. These are only collected when:
* hint ‘gather_plan_statistics’ is used for the statement or
* parameter ‘statistics_level’ is set to ‘ALL’, at session or system level

Then as self stated you can also alter your session or use (un-documented) gather_plan_statistics hint to activate plan extended statistics:

SQL> SELECT /*+ gather_plan_statistics */ ...
SQL> ALTER SESSION SET statistics_level=ALL;

Cardinality feedback testing

After long and painful testing I have found below (stupid) SQL statement that could be used to demonstrate feature (the comment is used to easily find the sql_id in V$SQL). The SQL statement used is exactly the one below (uppercase or lowercase make a difference when searching in V$SQL):

select /* Yannick */ sum(id2) from (select * from test1 where id1=4 and id2>5000000);

To prepare my environment I’m using:

SQL> ALTER SYSTEM flush shared_pool;
 
SYSTEM altered.
 
SQL> ALTER SYSTEM flush shared_pool;
 
SYSTEM altered.
 
SQL> SET lines 200 pages 200 TIMING ON
SQL> col sql_text FOR a90
SQL> col PLAN_TABLE_OUTPUT FOR a180
SQL> ALTER SESSION SET statistics_level=ALL;
 
SESSION altered.

So let’s execute the query and see how explain plan is changing:

SQL> SELECT /* Yannick */ SUM(id2) FROM (SELECT * FROM test1 WHERE id1=4 AND id2>5000000);
 
  SUM(ID2)
----------
 165770312
 
Elapsed: 00:00:07.81
SQL> SELECT sql_id,child_number,sql_text FROM v$sql WHERE sql_text LIKE 'select /* Yannick */%';
 
SQL_ID        CHILD_NUMBER SQL_TEXT
------------- ------------ ------------------------------------------------------------------------------------------
1r81h55uhssm5            0 SELECT /* Yannick */ SUM(id2) FROM (SELECT * FROM test1 WHERE id1=4 AND id2>5000000)
 
Elapsed: 00:00:00.01
SQL> SELECT sql_id,child_number,use_feedback_stats,reason FROM V$SQL_SHARED_CURSOR
WHERE sql_id IN (SELECT sql_id FROM v$sql WHERE sql_text LIKE 'select /* Yannick */%');
 
SQL_ID        CHILD_NUMBER U REASON
------------- ------------ - --------------------------------------------------------------------------------
1r81h55uhssm5            0 Y
 
Elapsed: 00:00:00.12

So query executed in around 8 seconds and we see that sql_id is candidate (use_feedback_stats column equals to Y) for cardinality feedback usage

Let’s execute a second time the exact same query:

SQL> SELECT /* Yannick */ SUM(id2) FROM (SELECT * FROM test1 WHERE id1=4 AND id2>5000000);
 
  SUM(ID2)
----------
 165770312
 
Elapsed: 00:00:00.00
SQL> SELECT sql_id,child_number,sql_text FROM v$sql WHERE sql_text LIKE 'select /* Yannick */%';
 
SQL_ID        CHILD_NUMBER SQL_TEXT
------------- ------------ ------------------------------------------------------------------------------------------
1r81h55uhssm5            0 SELECT /* Yannick */ SUM(id2) FROM (SELECT * FROM test1 WHERE id1=4 AND id2>5000000)
1r81h55uhssm5            1 SELECT /* Yannick */ SUM(id2) FROM (SELECT * FROM test1 WHERE id1=4 AND id2>5000000)
 
Elapsed: 00:00:00.01
SQL> SELECT sql_id,child_number,use_feedback_stats,reason FROM V$SQL_SHARED_CURSOR
WHERE sql_id IN (SELECT sql_id FROM v$sql WHERE sql_text LIKE 'select /* Yannick */%');  2
 
SQL_ID        CHILD_NUMBER U REASON
------------- ------------ - --------------------------------------------------------------------------------
1r81h55uhssm5            0 Y <ChildNode><ChildNumber>0</ChildNumber><ID>3</ID><reason>Optimizer mismatch(13)<
1r81h55uhssm5            1 Y
 
Elapsed: 00:00:00.07

Query second execution is instantaneous, we also see that a second child has been created for the same sql_id. We also see that second child is also candidate for cardinality feedback feature, so let’s execute a third time the exact same query:

SQL> SELECT /* Yannick */ SUM(id2) FROM (SELECT * FROM test1 WHERE id1=4 AND id2>5000000);
 
  SUM(ID2)
----------
 165770312
 
Elapsed: 00:00:00.01
SQL> SELECT sql_id,child_number,sql_text FROM v$sql WHERE sql_text LIKE 'select /* Yannick */%';
 
SQL_ID        CHILD_NUMBER SQL_TEXT
------------- ------------ ------------------------------------------------------------------------------------------
1r81h55uhssm5            0 SELECT /* Yannick */ SUM(id2) FROM (SELECT * FROM test1 WHERE id1=4 AND id2>5000000)
1r81h55uhssm5            1 SELECT /* Yannick */ SUM(id2) FROM (SELECT * FROM test1 WHERE id1=4 AND id2>5000000)
1r81h55uhssm5            2 SELECT /* Yannick */ SUM(id2) FROM (SELECT * FROM test1 WHERE id1=4 AND id2>5000000)
 
Elapsed: 00:00:00.02
SQL> SELECT sql_id,child_number,use_feedback_stats,reason FROM V$SQL_SHARED_CURSOR
WHERE sql_id IN (SELECT sql_id FROM v$sql WHERE sql_text LIKE 'select /* Yannick */%');  2
 
SQL_ID        CHILD_NUMBER U REASON
------------- ------------ - --------------------------------------------------------------------------------
1r81h55uhssm5            0 Y <ChildNode><ChildNumber>0</ChildNumber><ID>3</ID><reason>Optimizer mismatch(13)<
1r81h55uhssm5            1 Y <ChildNode><ChildNumber>1</ChildNumber><ID>3</ID><reason>Optimizer mismatch(13)<
1r81h55uhssm5            2 N
 
Elapsed: 00:00:00.08

Third execution, also executed instantaneously, generated a final child that is no more candidate for cardinality feedback feature. Now let’s compare the different explain plan. First child doing an index range scan on the two indexes and we can see the difference between E-Rows (Estimated rows) and A-Rows (Actual rows). This is where, as poor statistics, the optimizer if failing to estimate correct number of returned rows:

SQL> SELECT * FROM TABLE(dbms_xplan.display_cursor(sql_id=>'1r81h55uhssm5', cursor_child_no=>'0', format=> 'all allstats'));
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  1r81h55uhssm5, child NUMBER 0
-------------------------------------
SELECT /* Yannick */ SUM(id2) FROM (SELECT * FROM test1 WHERE id1=4 AND
id2>5000000)
 
PLAN hash VALUE: 2595670947
 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name             | Starts | E-ROWS |E-Bytes| COST (%CPU)| E-TIME   | A-ROWS |   A-TIME   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                  |      1 |        |       | 17815 (100)|          |      1 |00:00:07.78 |    5852 |   3321 |       |       |          |
|   1 |  SORT AGGREGATE     |                  |      1 |      1 |     9 |            |          |      1 |00:00:07.78 |    5852 |   3321 |       |       |          |
|*  2 |   VIEW              | INDEX$_join$_002 |      1 |   1000K|  8789K| 17815   (2)| 00:03:16 |     21 |00:00:07.78 |    5852 |   3321 |       |       |          |
|*  3 |    HASH JOIN        |                  |      1 |        |       |            |          |     21 |00:00:07.78 |    5852 |   3321 |  1627K|  1627K|     1/0/0|
|*  4 |     INDEX RANGE SCAN| TEST1_IDX_ID1    |      1 |   1000K|  8789K|  2050   (3)| 00:00:23 |     50 |00:00:00.01 |       3 |      0 |       |       |          |
|*  5 |     INDEX RANGE SCAN| TEST1_IDX_ID2    |      1 |   1000K|  8789K|   176K  (4)| 00:32:27 |   4998K|00:00:03.94 |    5849 |   3321 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (IDENTIFIED BY operation id):
-------------------------------------------------------------
 
   1 - SEL$F5BB74E1
   2 - SEL$838CAA44 / TEST1@SEL$2
   3 - SEL$838CAA44
   4 - SEL$838CAA44 / indexjoin$_alias$_001@SEL$838CAA44
   5 - SEL$838CAA44 / indexjoin$_alias$_002@SEL$838CAA44
 
Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------
 
   2 - filter(("ID1"=4 AND "ID2">5000000))
   3 - ACCESS(ROWID=ROWID)
   4 - ACCESS("ID1"=4)
   5 - ACCESS("ID2">5000000)
 
COLUMN Projection Information (IDENTIFIED BY operation id):
-----------------------------------------------------------
 
   1 - (#keys=0) SUM("TEST1"."ID2")[22]
   2 - "ID2"[NUMBER,22], "ID1"[NUMBER,22]
   3 - (#keys=1) "ID1"[NUMBER,22], "ID2"[NUMBER,22]
   4 - ROWID[ROWID,10], "ID1"[NUMBER,22]
   5 - ROWID[ROWID,10], "ID2"[NUMBER,22]

Second is now doing an unique index range scan on one index, we still see than the table access by index rowid is still not well estimated (big difference between E-Rows and A-Rows columns). The note section of the explain plan is confirming cardinality feature usage with with cardinality feedback used for this statement:

SQL> SELECT * FROM TABLE(dbms_xplan.display_cursor(sql_id=>'1r81h55uhssm5', cursor_child_no=>'1', format=> 'all allstats'));
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  1r81h55uhssm5, child NUMBER 1
-------------------------------------
SELECT /* Yannick */ SUM(id2) FROM (SELECT * FROM test1 WHERE id1=4 AND
id2>5000000)
 
PLAN hash VALUE: 629791772
 
----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Starts | E-ROWS |E-Bytes| COST (%CPU)| E-TIME   | A-ROWS |   A-TIME   | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |      1 |        |       |  2029 (100)|          |      1 |00:00:00.01 |      53 |
|   1 |  SORT AGGREGATE              |               |      1 |      1 |     9 |            |          |      1 |00:00:00.01 |      53 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| TEST1         |      1 |   1000K|  8789K|  2029   (2)| 00:00:23 |     21 |00:00:00.01 |      53 |
|*  3 |    INDEX RANGE SCAN          | TEST1_IDX_ID1 |      1 |     53 |       |  2027   (2)| 00:00:23 |     50 |00:00:00.01 |       3 |
----------------------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (IDENTIFIED BY operation id):
-------------------------------------------------------------
 
   1 - SEL$F5BB74E1
   2 - SEL$F5BB74E1 / TEST1@SEL$2
   3 - SEL$F5BB74E1 / TEST1@SEL$2
 
Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------
 
   2 - filter("ID2">5000000)
   3 - ACCESS("ID1"=4)
 
COLUMN Projection Information (IDENTIFIED BY operation id):
-----------------------------------------------------------
 
   1 - (#keys=0) SUM("TEST1"."ID2")[22]
   2 - "ID2"[NUMBER,22]
   3 - "TEST1".ROWID[ROWID,10]
 
Note
-----
   - CARDINALITY feedback used FOR this statement

Remark:
In my case the plan generated using the cardinality feedback feature is performing much better than the original one, unfortunately it is not always the case…

Third explain is not different from previous one except that now cardinality is well estimated:

SQL> SELECT * FROM TABLE(dbms_xplan.display_cursor(sql_id=>'1r81h55uhssm5', cursor_child_no=>'2', format=> 'all allstats'));
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  1r81h55uhssm5, child NUMBER 2
-------------------------------------
SELECT /* Yannick */ SUM(id2) FROM (SELECT * FROM test1 WHERE id1=4 AND
id2>5000000)
 
PLAN hash VALUE: 629791772
 
----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Starts | E-ROWS |E-Bytes| COST (%CPU)| E-TIME   | A-ROWS |   A-TIME   | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |      1 |        |       |  2029 (100)|          |      1 |00:00:00.01 |      53 |
|   1 |  SORT AGGREGATE              |               |      1 |      1 |     9 |            |          |      1 |00:00:00.01 |      53 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| TEST1         |      1 |     21 |   189 |  2029   (2)| 00:00:23 |     21 |00:00:00.01 |      53 |
|*  3 |    INDEX RANGE SCAN          | TEST1_IDX_ID1 |      1 |     53 |       |  2027   (2)| 00:00:23 |     50 |00:00:00.01 |       3 |
----------------------------------------------------------------------------------------------------------------------------------------
.
.
.
Note
-----
   - CARDINALITY feedback used FOR this statement

In 12cR1 the feature is called statistics feedback and add the below feature:

The optimizer also creates a SQL plan directive so that other SQL statements can benefit from the information obtained during this initial execution

There is also a new IS_REOPTIMIZABLE column in V$SQL:

This columns shows whether the next execution matching this child cursor will trigger a reoptimization. The values are:
Y: If the next execution will trigger a reoptimization
R: If the child cursor contains reoptimization information, but will not trigger reoptimization because the cursor was compiled in reporting mode
N: If the child cursor has no reoptimization information

Remark:
The above example does not work in 12cR1 (12.1.0.2) due to different optimizer, so to test SQL Plan Directive I will have to find something different…

References

About Post Author

Share the knowledge!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>