Table of contents
Preamble
Data compression is always presented to gain disk space for big Enterprise DataWareHouse databases, but at the end is your EDWH really that big ? One of our EDWH is only 1TB and disk are cheap isn’t it ? So why bothering with data compression in your DataWareHouse ? One of the good reason I see is because your main wait events of your DataWareHouse are most probably db file scattered read and db file sequential read and your I/O subsystems is also most probably saturated… And compressing your data will magically make your system retrieving more informations (tables’ rows) by reading less blocks (with the drawback of a small CPU overhead anyway). Surprisingly blocks stay in compressed state in the buffer cache so you have also the added value of having more information in buffer cache…
Test table
Create first a test table to compare its size with compressed copies, let’s put some duplicates in it to ease Oracle engine life:
SQL> CREATE TABLE t1 AS SELECT * FROM (SELECT * FROM dba_objects UNION ALL SELECT * FROM dba_objects) ORDER BY owner,object_name,subobject_name,object_id; TABLE created. SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'yjaquier', tabname=>'t1', estimate_percent=>NULL, method_opt=>'FOR ALL COLUMNS SIZE AUTO', degree=>DBMS_STATS.DEFAULT_DEGREE, CASCADE=>TRUE); PL/SQL PROCEDURE successfully completed. |
Basic compression
Capability to use it is included in Enterprise Edition !!
SQL> CREATE TABLE t2 AS SELECT * FROM t1 WHERE 0=1; TABLE created. SQL> ALTER TABLE t2 COMPRESS basic; TABLE altered. SQL> INSERT INTO t2 SELECT * FROM t1; 145824 ROWS created. SQL> COMMIT; COMMIT complete. SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'yjaquier', tabname=>'t2', estimate_percent=>NULL, method_opt=>'FOR ALL COLUMNS SIZE AUTO', degree=>DBMS_STATS.DEFAULT_DEGREE, CASCADE=>TRUE); PL/SQL PROCEDURE successfully completed. |
Now let’s check the size of the two tables (to have information in DBA_TABLES you must gather first the statistics):
SQL> SET lines 130 SQL> col segment_name FOR a20 SQL> SELECT segment_name,segment_type,bytes,blocks,extents FROM dba_segments WHERE segment_name IN ('T1','T2') ORDER BY segment_name; SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS EXTENTS -------------------- ------------------ ---------- ---------- ---------- T1 TABLE 17825792 2176 32 T2 TABLE 17825792 2176 32 SQL> SELECT table_name,num_rows,blocks,compression,compress_for FROM dba_tables WHERE table_name IN ('T1','T2') ORDER BY table_name; TABLE_NAME NUM_ROWS BLOCKS COMPRESS COMPRESS_FOR ------------------------------ ---------- ---------- -------- ------------ T1 145824 2119 DISABLED T2 145824 2134 ENABLED BASIC |
Wow no gain at all !! It is even worst in number of blocks !! but as explained in various Oracle white paper, compression occurs only when data is inserted with following operations:
- Direct Path SQL*Loader
- CREATE TABLE … AS SELECT statement
- Parallel INSERT (or serial INSERT with an APPEND hint) statement:
You can also use the ALTER TABLE … MOVE COMPRESS statement.
This shows that it will not be easy to benefit from basic compression and will require lots of attention…
SQL> ALTER TABLE t2 move COMPRESS nologging; TABLE altered. SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'yjaquier', tabname=>'t2', estimate_percent=>NULL, method_opt=>'FOR ALL COLUMNS SIZE AUTO', degree=>DBMS_STATS.DEFAULT_DEGREE, CASCADE=>TRUE); PL/SQL PROCEDURE successfully completed. SQL> SELECT segment_name,segment_type,bytes,blocks,extents FROM dba_segments WHERE segment_name IN ('T1','T2') ORDER BY segment_name; SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS EXTENTS -------------------- ------------------ ---------- ---------- ---------- T1 TABLE 17825792 2176 32 T2 TABLE 5242880 896 22 SQL> SELECT table_name,num_rows,blocks,compression,compress_for FROM dba_tables WHERE table_name IN ('T1','T2') ORDER BY table_name; TABLE_NAME NUM_ROWS BLOCKS COMPRESS COMPRESS_FOR ------------------------------ ---------- ---------- -------- ------------ T1 145824 2119 DISABLED T2 145824 788 ENABLED BASIC |
Ok it is better, other options to achieve it are described below (check that compression ratio is different with each method:
SQL> TRUNCATE TABLE t2; TABLE truncated. SQL> INSERT /*+ append */ INTO t2 SELECT * FROM t1; 145824 ROWS created. SQL> COMMIT; COMMIT complete. SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'yjaquier', tabname=>'t2', estimate_percent=>NULL, method_opt=>'FOR ALL COLUMNS SIZE AUTO', degree=>DBMS_STATS.DEFAULT_DEGREE, CASCADE=>TRUE); PL/SQL PROCEDURE successfully completed. SQL> SELECT segment_name,segment_type,bytes,blocks,extents FROM dba_segments WHERE segment_name IN ('T1','T2') ORDER BY segment_name; SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS EXTENTS -------------------- ------------------ ---------- ---------- ---------- T1 TABLE 17825792 2176 32 T2 TABLE 5242880 640 20 SQL> SELECT table_name,num_rows,blocks,compression,compress_for FROM dba_tables WHERE table_name IN ('T1','T2') ORDER BY table_name; TABLE_NAME NUM_ROWS BLOCKS COMPRESS COMPRESS_FOR ------------------------------ ---------- ---------- -------- ------------ T1 145824 2119 DISABLED T2 145824 640 ENABLED BASIC |
SQL> TRUNCATE TABLE t2; TABLE truncated. SQL> ALTER SESSION enable parallel dml; SESSION altered. SQL> INSERT /*+ parallel(t2,2) */ INTO t2 SELECT * FROM t1; 145824 ROWS created. SQL> COMMIT; COMMIT complete. SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'yjaquier', tabname=>'t2', estimate_percent=>NULL, method_opt=>'FOR ALL COLUMNS SIZE AUTO', degree=>DBMS_STATS.DEFAULT_DEGREE, CASCADE=>TRUE); PL/SQL PROCEDURE successfully completed. SQL> SELECT segment_name,segment_type,bytes,blocks,extents FROM dba_segments WHERE segment_name IN ('T1','T2') ORDER BY segment_name; SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS EXTENTS -------------------- ------------------ ---------- ---------- ---------- T1 TABLE 17825792 2176 32 T2 TABLE 6619136 888 9 SQL> SELECT table_name,num_rows,blocks,compression,compress_for FROM dba_tables WHERE table_name IN ('T1','T2') ORDER BY table_name; TABLE_NAME NUM_ROWS BLOCKS COMPRESS COMPRESS_FOR ------------------------------ ---------- ---------- -------- ------------ T1 145824 2119 DISABLED T2 145824 888 ENABLED BASIC |
Remark:
If instead of using TRUNCATE you drop and recreate the table, compression ratio will be different (!!!). Same behavior if when having created the table with parallel hint you use ALTER TABLE … MOVE COMPRESS you will see something like:
SQL> SELECT segment_name,segment_type,bytes,blocks,extents FROM dba_segments WHERE segment_name IN ('T1','T2') ORDER BY segment_name; SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS EXTENTS -------------------- ------------------ ---------- ---------- ---------- T1 TABLE 17825792 2176 32 T2 TABLE 7340032 896 22 |
One of the main issue with this method are DML statements over the table, time after time you completely loose compression benefit, it could be even worst at the end with the exact same information !!:
SQL> UPDATE t2 SET owner=owner, object_name=object_name, subobject_name=subobject_name, edition_name=edition_name; 145824 ROWS updated. SQL> COMMIT; COMMIT complete. SQL> SELECT segment_name,segment_type,bytes,blocks,extents FROM dba_segments WHERE segment_name IN ('T1','T2') ORDER BY segment_name; SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS EXTENTS -------------------- ------------------ ---------- ---------- ---------- T1 TABLE 17825792 2176 32 T2 TABLE 19202048 2344 21 |
OLTP compression
Be careful you must purchase the license before using it (Advanced Compression option of the Enterprise Edition) !! Non free option !!
SQL> CREATE TABLE t3 AS SELECT * FROM t1 WHERE 0=1; TABLE created. SQL> ALTER TABLE t3 COMPRESS FOR oltp; TABLE altered. SQL> INSERT INTO t3 SELECT * FROM t1; 145824 ROWS created. SQL> COMMIT; COMMIT complete. SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'yjaquier', tabname=>'t3', estimate_percent=>NULL, method_opt=>'FOR ALL COLUMNS SIZE AUTO', degree=>DBMS_STATS.DEFAULT_DEGREE, CASCADE=>TRUE); PL/SQL PROCEDURE successfully completed. SQL> SELECT segment_name,segment_type,bytes,blocks,extents FROM dba_segments WHERE segment_name IN ('T1','T3') ORDER BY segment_name; SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS EXTENTS -------------------- ------------------ ---------- ---------- ---------- T1 TABLE 17825792 2176 32 T3 TABLE 7340032 896 22 SQL> SELECT table_name,num_rows,blocks,compression,compress_for FROM dba_tables WHERE table_name IN ('T1','T3') ORDER BY table_name; TABLE_NAME NUM_ROWS BLOCKS COMPRESS COMPRESS_FOR ------------------------------ ---------- ---------- -------- ------------ T1 145824 2119 DISABLED T3 145824 874 ENABLED OLTP |
SQL> UPDATE t3 SET owner=owner, object_name=object_name, subobject_name=subobject_name, edition_name=edition_name; 145824 ROWS updated. SQL> COMMIT; COMMIT complete. SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'yjaquier', tabname=>'t3', estimate_percent=>NULL, method_opt=>'FOR ALL COLUMNS SIZE AUTO', degree=>DBMS_STATS.DEFAULT_DEGREE, CASCADE=>TRUE); PL/SQL PROCEDURE successfully completed. SQL> SELECT segment_name,segment_type,bytes,blocks,extents FROM dba_segments WHERE segment_name IN ('T1','T3') ORDER BY segment_name; SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS EXTENTS -------------------- ------------------ ---------- ---------- ---------- T1 TABLE 17825792 2176 32 T3 TABLE 12582912 1536 27 SQL> SELECT table_name,num_rows,blocks,compression,compress_for FROM dba_tables WHERE table_name IN ('T1','T3') ORDER BY table_name; TABLE_NAME NUM_ROWS BLOCKS COMPRESS COMPRESS_FOR ------------------------------ ---------- ---------- -------- ------------ T1 145824 2119 DISABLED T3 145824 1504 ENABLED OLTP |
If you update it again (using same UPDATE command) it can even go upper to the size of the original table:
SQL> SELECT segment_name,segment_type,bytes,blocks,extents FROM dba_segments WHERE segment_name IN ('T1','T3') ORDER BY segment_name; SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS EXTENTS -------------------- ------------------ ---------- ---------- ---------- T1 TABLE 17825792 2176 32 T3 TABLE 23068672 2816 37 |
SQL> ALTER TABLE t3 enable ROW movement; TABLE altered. SQL> ALTER TABLE t3 shrink SPACE compact; ALTER TABLE t3 shrink * ERROR AT line 1: ORA-10630: Illegal syntax specified WITH SHRINK clause |
This is briefly explained in Note: 732519.1, shrink in not supported on compressed table, you are obliged to do:
SQL> ALTER TABLE t3 NOCOMPRESS; TABLE altered. SQL> ALTER TABLE t3 enable ROW movement; TABLE altered. SQL> ALTER TABLE t3 shrink SPACE compact; TABLE altered. SQL> SELECT segment_name,segment_type,bytes,blocks,extents FROM dba_segments WHERE segment_name IN ('T1','T3') ORDER BY segment_name; SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS EXTENTS -------------------- ------------------ ---------- ---------- ---------- T1 TABLE 17825792 2176 32 T3 TABLE 23068672 2816 37 |
But as you can see it does not provide any result…
So at the end not much solution except the ALTER TABLE … MOVE COMPRESS but this could be unacceptable for a big DWH table. So unless I’m hitting a bug or something even with the OLTP compression you must avoid the DML but it brings at least the flexibility in the DML commands that you do not have with basic compression.
Hybrid columnar compression
These advanced compression methods are reserved for the lucky owner of the Exadata Oracle machine…
SQL> ALTER TABLE t3 COMPRESS FOR archive high; ALTER TABLE t3 COMPRESS FOR archive high * ERROR AT line 1: ORA-64307: hybrid columnar compression IS only supported IN tablespaces residing ON Exadata storage SQL> ALTER TABLE t3 COMPRESS FOR query high; ALTER TABLE t3 COMPRESS FOR query high * ERROR AT line 1: ORA-64307: hybrid columnar compression IS only supported IN tablespaces residing ON Exadata storage |
Performance
Let’s start with:
SQL> SELECT segment_name,segment_type,bytes,blocks,extents FROM dba_segments WHERE segment_name IN ('T1','T2','T3') ORDER BY segment_name; SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS EXTENTS -------------------- ------------------ ---------- ---------- ---------- T1 TABLE 17825792 2176 32 T2 TABLE 5242880 640 20 T3 TABLE 7340032 896 22 SQL> SELECT table_name,num_rows,blocks,compression,compress_for FROM dba_tables WHERE table_name IN ('T1','T2','T3') ORDER BY table_name; TABLE_NAME NUM_ROWS BLOCKS COMPRESS COMPRESS_FOR ------------------------------ ---------- ---------- -------- ------------ T1 145824 2119 DISABLED T2 145824 640 ENABLED BASIC T3 145824 874 ENABLED OLTP |
Then obviously number of blocks retrieved is lesser and CPU is a bit higher with compressed tables:
Make sure that for all sessions you have:
SQL> ALTER SESSION SET statistics_level=ALL; SESSION altered. SQL> ALTER SESSION SET timed_statistics=TRUE; SESSION altered. SQL> SELECT SID FROM v$mystat WHERE ROWNUM=1; SID ---------- 51 SQL> SET autotrace traceonly EXPLAIN STATISTICS; |
SQL> ALTER SYSTEM flush buffer_cache; SYSTEM altered. SQL> SELECT * FROM t1; 145824 ROWS selected. Execution PLAN ---------------------------------------------------------- PLAN hash VALUE: 3617692013 -------------------------------------------------------------------------- | Id | Operation | Name | ROWS | Bytes | COST (%CPU)| TIME | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 145K| 13M| 578 (1)| 00:00:07 | | 1 | TABLE ACCESS FULL| T1 | 145K| 13M| 578 (1)| 00:00:07 | -------------------------------------------------------------------------- STATISTICS ---------------------------------------------------------- 0 recursive calls 0 db block gets 11654 consistent gets 2078 physical reads 0 redo SIZE 5185153 bytes sent via SQL*Net TO client 107350 bytes received via SQL*Net FROM client 9723 SQL*Net roundtrips TO/FROM client 0 sorts (memory) 0 sorts (disk) 145824 ROWS processed |
SQL> ALTER SYSTEM flush buffer_cache; SYSTEM altered. SQL> SELECT * FROM t2; 145824 ROWS selected. Execution PLAN ---------------------------------------------------------- PLAN hash VALUE: 1513984157 -------------------------------------------------------------------------- | Id | Operation | Name | ROWS | Bytes | COST (%CPU)| TIME | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 145K| 13M| 177 (2)| 00:00:03 | | 1 | TABLE ACCESS FULL| T2 | 145K| 13M| 177 (2)| 00:00:03 | -------------------------------------------------------------------------- STATISTICS ---------------------------------------------------------- 1 recursive calls 0 db block gets 10307 consistent gets 623 physical reads 0 redo SIZE 5185153 bytes sent via SQL*Net TO client 107350 bytes received via SQL*Net FROM client 9723 SQL*Net roundtrips TO/FROM client 0 sorts (memory) 0 sorts (disk) 145824 ROWS processed |
SQL> ALTER SYSTEM flush buffer_cache; SYSTEM altered. SQL> SELECT * FROM t3; 145824 ROWS selected. Execution PLAN ---------------------------------------------------------- PLAN hash VALUE: 4161002650 -------------------------------------------------------------------------- | Id | Operation | Name | ROWS | Bytes | COST (%CPU)| TIME | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 145K| 13M| 240 (1)| 00:00:03 | | 1 | TABLE ACCESS FULL| T3 | 145K| 13M| 240 (1)| 00:00:03 | -------------------------------------------------------------------------- STATISTICS ---------------------------------------------------------- 1 recursive calls 0 db block gets 10505 consistent gets 817 physical reads 0 redo SIZE 5208091 bytes sent via SQL*Net TO client 107350 bytes received via SQL*Net FROM client 9723 SQL*Net roundtrips TO/FROM client 0 sorts (memory) 0 sorts (disk) 145824 ROWS processed |
then use the following SQL in another session to measure the CPU used:
SELECT b.name,a.VALUE FROM v$sesstat a, v$statname b WHERE a.statistic#=b.statistic# AND b.name='CPU used by this session' AND a.sid=51; |
Result based on 4 runs is:
T1 | T2 | T3 | |
---|---|---|---|
CPU used (cs) | 93 | 99 | 99 |
Consistent gets | 11654 | 10307 | 10505 |
Physical reads | 2078 | 623 | 817 |
So as expected much less physical read and not that much CPU overhead on this basic example (FTS). As explained in many notes available on Internet a special care should be taken with INSERT/UPDATE/DELETE execution time but as we have seen they produce very bad results so better avoiding them…
References
Compression syntax is the following (ALTER/CREATE TABLE):
{ COMPRESS [ BASIC | FOR { OLTP | { QUERY | ARCHIVE } [ LOW | HIGH ] } ] | NOCOMPRESS }