Data compression with Oracle 11gR2

Preamble

Data compression is always presented to gain disk space for big Enterprise DataWareHouse databases, but at the end is your EDWH really that big ? One of our EDWH is only 1TB and disk are cheap isn’t it ? So why bothering with data compression in your DataWareHouse ? One of the good reason I see is because your main wait events of your DataWareHouse are most probably db file scattered read and db file sequential read and your I/O subsystems is also most probably saturated… And compressing your data will magically make your system retrieving more informations (tables’ rows) by reading less blocks (with the drawback of a small CPU overhead anyway). Surprisingly blocks stay in compressed state in the buffer cache so you have also the added value of having more information in buffer cache…

Test table

Create first a test table to compare its size with compressed copies, let’s put some duplicates in it to ease Oracle engine life:

SQL> CREATE TABLE t1 AS SELECT * FROM (SELECT * FROM dba_objects UNION ALL SELECT * FROM dba_objects) ORDER BY owner,object_name,subobject_name,object_id;
 
TABLE created.
 
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'yjaquier', tabname=>'t1', estimate_percent=>NULL, method_opt=>'FOR ALL COLUMNS SIZE AUTO', degree=>DBMS_STATS.DEFAULT_DEGREE, CASCADE=>TRUE);
 
PL/SQL PROCEDURE successfully completed.

Basic compression

Capability to use it is included in Enterprise Edition !!

SQL> CREATE TABLE t2 AS SELECT * FROM t1 WHERE 0=1;
 
TABLE created.
 
SQL> ALTER TABLE t2 COMPRESS basic;
 
TABLE altered.
 
SQL> INSERT INTO t2 SELECT * FROM t1;
 
145824 ROWS created.
 
SQL> COMMIT;
 
COMMIT complete.
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'yjaquier', tabname=>'t2', estimate_percent=>NULL, method_opt=>'FOR ALL COLUMNS SIZE AUTO', degree=>DBMS_STATS.DEFAULT_DEGREE, CASCADE=>TRUE);
 
PL/SQL PROCEDURE successfully completed.

Now let’s check the size of the two tables (to have information in DBA_TABLES you must gather first the statistics):

SQL> SET lines 130
SQL> col segment_name FOR a20
SQL> SELECT segment_name,segment_type,bytes,blocks,extents
FROM dba_segments
WHERE segment_name IN ('T1','T2')
ORDER BY segment_name;
 
SEGMENT_NAME         SEGMENT_TYPE            BYTES     BLOCKS    EXTENTS
-------------------- ------------------ ---------- ---------- ----------
T1                   TABLE                17825792       2176         32
T2                   TABLE                17825792       2176         32
 
SQL> SELECT table_name,num_rows,blocks,compression,compress_for FROM dba_tables
WHERE table_name IN ('T1','T2')
ORDER BY table_name;
 
TABLE_NAME                       NUM_ROWS     BLOCKS COMPRESS COMPRESS_FOR
------------------------------ ---------- ---------- -------- ------------
T1                                 145824       2119 DISABLED
T2                                 145824       2134 ENABLED  BASIC

Wow no gain at all !! It is even worst in number of blocks !! but as explained in various Oracle white paper, compression occurs only when data is inserted with following operations:

  • Direct Path SQL*Loader
  • CREATE TABLE … AS SELECT statement
  • Parallel INSERT (or serial INSERT with an APPEND hint) statement:

You can also use the ALTER TABLE … MOVE COMPRESS statement.

This shows that it will not be easy to benefit from basic compression and will require lots of attention…

SQL> ALTER TABLE t2 move COMPRESS nologging;
 
TABLE altered.
 
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'yjaquier', tabname=>'t2', estimate_percent=>NULL, method_opt=>'FOR ALL COLUMNS SIZE AUTO', degree=>DBMS_STATS.DEFAULT_DEGREE, CASCADE=>TRUE);
 
PL/SQL PROCEDURE successfully completed.
 
SQL> SELECT segment_name,segment_type,bytes,blocks,extents
FROM dba_segments
WHERE segment_name IN ('T1','T2')
ORDER BY segment_name;
 
SEGMENT_NAME         SEGMENT_TYPE            BYTES     BLOCKS    EXTENTS
-------------------- ------------------ ---------- ---------- ----------
T1                   TABLE                17825792       2176         32
T2                   TABLE                 5242880        896         22
 
SQL> SELECT table_name,num_rows,blocks,compression,compress_for FROM dba_tables
WHERE table_name IN ('T1','T2')
ORDER BY table_name;
 
TABLE_NAME                       NUM_ROWS     BLOCKS COMPRESS COMPRESS_FOR
------------------------------ ---------- ---------- -------- ------------
T1                                 145824       2119 DISABLED
T2                                 145824        788 ENABLED  BASIC

Ok it is better, other options to achieve it are described below (check that compression ratio is different with each method:

SQL> TRUNCATE TABLE t2;
 
TABLE truncated.
 
SQL> INSERT /*+ append */ INTO t2 SELECT * FROM t1;
 
145824 ROWS created.
 
SQL> COMMIT;
 
COMMIT complete.
 
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'yjaquier', tabname=>'t2', estimate_percent=>NULL, method_opt=>'FOR ALL COLUMNS SIZE AUTO', degree=>DBMS_STATS.DEFAULT_DEGREE, CASCADE=>TRUE);
 
PL/SQL PROCEDURE successfully completed.
 
SQL> SELECT segment_name,segment_type,bytes,blocks,extents
FROM dba_segments
WHERE segment_name IN ('T1','T2')
ORDER BY segment_name;
 
SEGMENT_NAME         SEGMENT_TYPE            BYTES     BLOCKS    EXTENTS
-------------------- ------------------ ---------- ---------- ----------
T1                   TABLE                17825792       2176         32
T2                   TABLE                 5242880        640         20
 
SQL> SELECT table_name,num_rows,blocks,compression,compress_for FROM dba_tables
WHERE table_name IN ('T1','T2')
ORDER BY table_name;
 
TABLE_NAME                       NUM_ROWS     BLOCKS COMPRESS COMPRESS_FOR
------------------------------ ---------- ---------- -------- ------------
T1                                 145824       2119 DISABLED
T2                                 145824        640 ENABLED  BASIC
SQL> TRUNCATE TABLE t2;
 
TABLE truncated.
 
SQL> ALTER SESSION enable parallel dml;
 
SESSION altered.
 
SQL> INSERT /*+ parallel(t2,2) */ INTO t2 SELECT * FROM t1;
 
145824 ROWS created.
 
SQL> COMMIT;
 
COMMIT complete.
 
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'yjaquier', tabname=>'t2', estimate_percent=>NULL, method_opt=>'FOR ALL COLUMNS SIZE AUTO', degree=>DBMS_STATS.DEFAULT_DEGREE, CASCADE=>TRUE);
 
PL/SQL PROCEDURE successfully completed.
 
SQL> SELECT segment_name,segment_type,bytes,blocks,extents
FROM dba_segments
WHERE segment_name IN ('T1','T2')
ORDER BY segment_name;
 
SEGMENT_NAME         SEGMENT_TYPE            BYTES     BLOCKS    EXTENTS
-------------------- ------------------ ---------- ---------- ----------
T1                   TABLE                17825792       2176         32
T2                   TABLE                 6619136        888          9
 
SQL> SELECT table_name,num_rows,blocks,compression,compress_for FROM dba_tables
WHERE table_name IN ('T1','T2')
ORDER BY table_name;
 
TABLE_NAME                       NUM_ROWS     BLOCKS COMPRESS COMPRESS_FOR
------------------------------ ---------- ---------- -------- ------------
T1                                 145824       2119 DISABLED
T2                                 145824        888 ENABLED  BASIC

Remark:
If instead of using TRUNCATE you drop and recreate the table, compression ratio will be different (!!!). Same behavior if when having created the table with parallel hint you use ALTER TABLE … MOVE COMPRESS you will see something like:

SQL> SELECT segment_name,segment_type,bytes,blocks,extents
FROM dba_segments
WHERE segment_name IN ('T1','T2')
ORDER BY segment_name;
 
SEGMENT_NAME         SEGMENT_TYPE            BYTES     BLOCKS    EXTENTS
-------------------- ------------------ ---------- ---------- ----------
T1                   TABLE                17825792       2176         32
T2                   TABLE                 7340032        896         22

One of the main issue with this method are DML statements over the table, time after time you completely loose compression benefit, it could be even worst at the end with the exact same information !!:

SQL> UPDATE t2 SET owner=owner, object_name=object_name, subobject_name=subobject_name, edition_name=edition_name;
 
145824 ROWS updated.
 
SQL> COMMIT;
 
COMMIT complete.
 
SQL> SELECT segment_name,segment_type,bytes,blocks,extents
FROM dba_segments
WHERE segment_name IN ('T1','T2')
ORDER BY segment_name;
 
SEGMENT_NAME         SEGMENT_TYPE            BYTES     BLOCKS    EXTENTS
-------------------- ------------------ ---------- ---------- ----------
T1                   TABLE                17825792       2176         32
T2                   TABLE                19202048       2344         21

OLTP compression

Be careful you must purchase the license before using it (Advanced Compression option of the Enterprise Edition) !! Non free option !!

SQL> CREATE TABLE t3  AS SELECT * FROM t1 WHERE 0=1;
 
TABLE created.
 
SQL> ALTER TABLE t3 COMPRESS FOR oltp;
 
TABLE altered.
 
SQL> INSERT INTO t3 SELECT * FROM t1;
 
145824 ROWS created.
 
SQL> COMMIT;
 
COMMIT complete.
 
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'yjaquier', tabname=>'t3', estimate_percent=>NULL, method_opt=>'FOR ALL COLUMNS SIZE AUTO', degree=>DBMS_STATS.DEFAULT_DEGREE, CASCADE=>TRUE);
 
PL/SQL PROCEDURE successfully completed.
 
SQL> SELECT segment_name,segment_type,bytes,blocks,extents
FROM dba_segments
WHERE segment_name IN ('T1','T3')
ORDER BY segment_name;
 
SEGMENT_NAME         SEGMENT_TYPE            BYTES     BLOCKS    EXTENTS
-------------------- ------------------ ---------- ---------- ----------
T1                   TABLE                17825792       2176         32
T3                   TABLE                 7340032        896         22
 
SQL> SELECT table_name,num_rows,blocks,compression,compress_for FROM dba_tables
WHERE table_name IN ('T1','T3')
ORDER BY table_name;
 
TABLE_NAME                       NUM_ROWS     BLOCKS COMPRESS COMPRESS_FOR
------------------------------ ---------- ---------- -------- ------------
T1                                 145824       2119 DISABLED
T3                                 145824        874 ENABLED  OLTP
SQL> UPDATE t3 SET owner=owner, object_name=object_name, subobject_name=subobject_name, edition_name=edition_name;
 
145824 ROWS updated.
 
SQL> COMMIT;
 
COMMIT complete.
 
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'yjaquier', tabname=>'t3', estimate_percent=>NULL, method_opt=>'FOR ALL COLUMNS SIZE AUTO', degree=>DBMS_STATS.DEFAULT_DEGREE, CASCADE=>TRUE);
 
PL/SQL PROCEDURE successfully completed.
 
SQL> SELECT segment_name,segment_type,bytes,blocks,extents
FROM dba_segments
WHERE segment_name IN ('T1','T3')
ORDER BY segment_name;
 
SEGMENT_NAME         SEGMENT_TYPE            BYTES     BLOCKS    EXTENTS
-------------------- ------------------ ---------- ---------- ----------
T1                   TABLE                17825792       2176         32
T3                   TABLE                12582912       1536         27
 
SQL> SELECT table_name,num_rows,blocks,compression,compress_for FROM dba_tables
WHERE table_name IN ('T1','T3')
ORDER BY table_name;
 
TABLE_NAME                       NUM_ROWS     BLOCKS COMPRESS COMPRESS_FOR
------------------------------ ---------- ---------- -------- ------------
T1                                 145824       2119 DISABLED
T3                                 145824       1504 ENABLED  OLTP

If you update it again (using same UPDATE command) it can even go upper to the size of the original table:

SQL> SELECT segment_name,segment_type,bytes,blocks,extents
FROM dba_segments
WHERE segment_name IN ('T1','T3')
ORDER BY segment_name;
 
SEGMENT_NAME         SEGMENT_TYPE            BYTES     BLOCKS    EXTENTS
-------------------- ------------------ ---------- ---------- ----------
T1                   TABLE                17825792       2176         32
T3                   TABLE                23068672       2816         37
SQL> ALTER TABLE t3 enable ROW movement;
 
TABLE altered.
 
SQL> ALTER TABLE t3 shrink SPACE compact;
ALTER TABLE t3 shrink
                    *
ERROR AT line 1:
ORA-10630: Illegal syntax specified WITH SHRINK clause

This is briefly explained in Note: 732519.1, shrink in not supported on compressed table, you are obliged to do:

SQL> ALTER TABLE t3 NOCOMPRESS;
 
TABLE altered.
 
SQL> ALTER TABLE t3 enable ROW movement;
 
TABLE altered.
 
SQL> ALTER TABLE t3 shrink SPACE compact;
 
TABLE altered.
 
SQL> SELECT segment_name,segment_type,bytes,blocks,extents
FROM dba_segments
WHERE segment_name IN ('T1','T3')
ORDER BY segment_name;
 
SEGMENT_NAME         SEGMENT_TYPE            BYTES     BLOCKS    EXTENTS
-------------------- ------------------ ---------- ---------- ----------
T1                   TABLE                17825792       2176         32
T3                   TABLE                23068672       2816         37

But as you can see it does not provide any result…

So at the end not much solution except the ALTER TABLE … MOVE COMPRESS but this could be unacceptable for a big DWH table. So unless I’m hitting a bug or something even with the OLTP compression you must avoid the DML but it brings at least the flexibility in the DML commands that you do not have with basic compression.

Hybrid columnar compression

These advanced compression methods are reserved for the lucky owner of the Exadata Oracle machine…

SQL> ALTER TABLE t3 COMPRESS FOR archive high;
ALTER TABLE t3 COMPRESS FOR archive high
*
ERROR AT line 1:
ORA-64307: hybrid columnar compression IS only supported IN tablespaces
residing ON Exadata storage
 
 
SQL> ALTER TABLE t3 COMPRESS FOR query high;
ALTER TABLE t3 COMPRESS FOR query high
*
ERROR AT line 1:
ORA-64307: hybrid columnar compression IS only supported IN tablespaces
residing ON Exadata storage

Performance

Let’s start with:

SQL> SELECT segment_name,segment_type,bytes,blocks,extents
FROM dba_segments
WHERE segment_name IN ('T1','T2','T3')
ORDER BY segment_name;
 
SEGMENT_NAME         SEGMENT_TYPE            BYTES     BLOCKS    EXTENTS
-------------------- ------------------ ---------- ---------- ----------
T1                   TABLE                17825792       2176         32
T2                   TABLE                 5242880        640         20
T3                   TABLE                 7340032        896         22
 
SQL>  SELECT table_name,num_rows,blocks,compression,compress_for FROM dba_tables
WHERE table_name IN ('T1','T2','T3')
ORDER BY table_name;
 
TABLE_NAME                       NUM_ROWS     BLOCKS COMPRESS COMPRESS_FOR
------------------------------ ---------- ---------- -------- ------------
T1                                 145824       2119 DISABLED
T2                                 145824        640 ENABLED  BASIC
T3                                 145824        874 ENABLED  OLTP

Then obviously number of blocks retrieved is lesser and CPU is a bit higher with compressed tables:

Make sure that for all sessions you have:

SQL> ALTER SESSION SET statistics_level=ALL;
 
SESSION altered.
 
SQL> ALTER SESSION SET timed_statistics=TRUE;
 
SESSION altered.
 
SQL> SELECT SID FROM  v$mystat WHERE ROWNUM=1;
 
       SID
----------
        51
 
SQL> SET autotrace traceonly EXPLAIN STATISTICS;
SQL> ALTER SYSTEM flush buffer_cache;
 
SYSTEM altered.
 
SQL> SELECT * FROM t1;
 
145824 ROWS selected.
 
 
Execution PLAN
----------------------------------------------------------
PLAN hash VALUE: 3617692013
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | ROWS  | Bytes | COST (%CPU)| TIME     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   145K|    13M|   578   (1)| 00:00:07 |
|   1 |  TABLE ACCESS FULL| T1   |   145K|    13M|   578   (1)| 00:00:07 |
--------------------------------------------------------------------------
 
 
STATISTICS
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      11654  consistent gets
       2078  physical reads
          0  redo SIZE
    5185153  bytes sent via SQL*Net TO client
     107350  bytes received via SQL*Net FROM client
       9723  SQL*Net roundtrips TO/FROM client
          0  sorts (memory)
          0  sorts (disk)
     145824  ROWS processed
SQL> ALTER SYSTEM flush buffer_cache;
 
SYSTEM altered.
 
SQL> SELECT * FROM t2;
 
145824 ROWS selected.
 
 
Execution PLAN
----------------------------------------------------------
PLAN hash VALUE: 1513984157
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | ROWS  | Bytes | COST (%CPU)| TIME     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   145K|    13M|   177   (2)| 00:00:03 |
|   1 |  TABLE ACCESS FULL| T2   |   145K|    13M|   177   (2)| 00:00:03 |
--------------------------------------------------------------------------
 
 
STATISTICS
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      10307  consistent gets
        623  physical reads
          0  redo SIZE
    5185153  bytes sent via SQL*Net TO client
     107350  bytes received via SQL*Net FROM client
       9723  SQL*Net roundtrips TO/FROM client
          0  sorts (memory)
          0  sorts (disk)
     145824  ROWS processed
SQL> ALTER SYSTEM flush buffer_cache;
 
SYSTEM altered.
 
SQL> SELECT * FROM t3;
 
145824 ROWS selected.
 
 
Execution PLAN
----------------------------------------------------------
PLAN hash VALUE: 4161002650
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | ROWS  | Bytes | COST (%CPU)| TIME     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   145K|    13M|   240   (1)| 00:00:03 |
|   1 |  TABLE ACCESS FULL| T3   |   145K|    13M|   240   (1)| 00:00:03 |
--------------------------------------------------------------------------
 
 
STATISTICS
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      10505  consistent gets
        817  physical reads
          0  redo SIZE
    5208091  bytes sent via SQL*Net TO client
     107350  bytes received via SQL*Net FROM client
       9723  SQL*Net roundtrips TO/FROM client
          0  sorts (memory)
          0  sorts (disk)
     145824  ROWS processed

then use the following SQL in another session to measure the CPU used:

SELECT b.name,a.VALUE
FROM v$sesstat a, v$statname b
WHERE a.statistic#=b.statistic#
AND b.name='CPU used by this session'
AND a.sid=51;

Result based on 4 runs is:

T1 T2 T3
CPU used (cs) 93 99 99
Consistent gets 11654 10307 10505
Physical reads 2078 623 817

So as expected much less physical read and not that much CPU overhead on this basic example (FTS). As explained in many notes available on Internet a special care should be taken with INSERT/UPDATE/DELETE execution time but as we have seen they produce very bad results so better avoiding them…

References

Compression syntax is the following (ALTER/CREATE TABLE):

{
COMPRESS [ BASIC | FOR { OLTP | { QUERY | ARCHIVE } [ LOW | HIGH ] } ]
|
NOCOMPRESS
}

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>