IT World https://blog.yannickjaquier.com RDBMS, Unix and many more... Thu, 18 Apr 2019 14:29:27 +0000 en-US hourly 1 https://wordpress.org/?v=5.1.1 StreamSets Data Collector replication with Oracle, MySQL and JSON https://blog.yannickjaquier.com/oracle/streamsets-data-collector-oracle-cdc-client.html https://blog.yannickjaquier.com/oracle/streamsets-data-collector-oracle-cdc-client.html#respond Thu, 11 Apr 2019 09:48:45 +0000 https://blog.yannickjaquier.com/?p=4493 Preamble I came across a nice overview article of Franck Pachot and shared it with few teammates and they have all been interested by StreamSets Data Collector product. One of the main reason is the obvious cost cutting versus GoldenGate that we have implemented in a project deployed worldwide. The product is free but has […]

The post StreamSets Data Collector replication with Oracle, MySQL and JSON appeared first on IT World.

]]>

Table of contents

Preamble

I came across a nice overview article of Franck Pachot and shared it with few teammates and they have all been interested by StreamSets Data Collector product. One of the main reason is the obvious cost cutting versus GoldenGate that we have implemented in a project deployed worldwide. The product is free but has some clearly described limitation like managing only INSERT, UPDATE, SELECT_FOR_UPDATE, and DELETE operations for one or more tables in a database. So in other words DDL are not managed as well as few data types (not an issue for us)…

To really handle DDL you would have to check “Produce Events” check box and handle the generated events on target to handle DDL, this is a little bit more complex and outside the scope of this blog post…

I have decided to give a try to the product and implement it for what we currently do with GoldenGate means building a reporting environment of our production database. Target is also an Oracle database but might be in future a MySQL one.

My testing environment is made of three servers:

  • server1.domain.com (192.168.56.101) is the primary database server.
  • server2.domain.com (192.168.56.102) is the secondary database server. The server hosting the databases (Oracle & MySQL) where figures should land.
  • server4.domain.com (192.168.56.104) is the Streamsets server.

Oracle database release is 18c Enterprise Edition Release 18.3.0.0.0. Streamsets is release 3.4.1. MySQL release is 8.0.12 MySQL Community Server.

The three servers are in fact VirtualBox guests running Oracle Linux Server release 7.5.

StreamSets Data Collector installation

I have first created a Linux streamsets account (in users group) with a /streamsets dedicated filesystem:

[streamsets@server4 ~]$ id
uid=1001(streamsets) gid=100(users) groups=100(users)
[streamsets@server4 ~]$ pwd
/streamsets
[streamsets@server4 ~]$ ll
total 246332
-rw-r--r-- 1 streamsets users 248202258 Aug  2 15:32 streamsets-datacollector-core-3.4.1.tgz
[streamsets@server4 streamsets]$ tar xvzf streamsets-datacollector-core-3.4.1.tgz
[streamsets@server4 ~]$ /streamsets/streamsets-datacollector-3.4.1/bin/streamsets dc
Can't find java, please set JAVA_HOME pointing to your java installation

I have installed Java SE Development Kit 8 (jdk-8u181-linux-x64.rpm). Only release 8 is supported so far…

[streamsets@server4 streamsets]$ /streamsets/streamsets-datacollector-3.4.1/bin/streamsets dc
Java 1.8 detected; adding $SDC_JAVA8_OPTS of “-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Djdk.nio.maxCachedBufferSize=262144” to $SDC_JAVA_OPTS
Configuration of maximum open file limit is too low: 1024 (expected at least 32768). Please consult https://goo.gl/LgvGFl

At the end of /etc/security/limits.conf I have added:

streamsets        soft    nofile           32768
streamsets        hard    nofile           32768

Now I can launch the process with:

[streamsets@server4 streamsets]$ /streamsets/streamsets-datacollector-3.4.1/bin/streamsets dc
Java 1.8 detected; adding $SDC_JAVA8_OPTS of "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Djdk.nio.maxCachedBufferSize=262144" to $SDC_JAVA_OPTS
Logging initialized @1033ms to org.eclipse.jetty.util.log.Slf4jLog
Running on URI : 'http://server4:18630'

Then from any browser (at this url for me, http://server4.domain.com:18630) you should get this login windows (admin/admin as default username/password):

streamsets01
streamsets01

Once logged you get:

streamsets02
streamsets02

By default Oracle CDC (Change Data Capture) client requires Oracle JDBC thin driver, in top left tool bar click on Package Manager (third icon). If you go in JDBC you see that nothing is there (the error message is because I’m behind a company proxy):

streamsets03
streamsets03

When I tried to import my JDBC thin driver file (ojdbc8.jar, the name for 18c (18.3) is the same as for 12.2.0.1, but the size differ) I have seen that JDBC category is not there:

streamsets04
streamsets04

I have spent a bit of time to see that everything was on Streamsets Data Collector download page:

[streamsets@server4 ~]$ /streamsets/streamsets-datacollector-3.4.1/bin/streamsets stagelibs -list

curl: (6) Could not resolve host: archives.streamsets.com; Unknown error
Failed! running curl -s -f https://archives.streamsets.com/datacollector/3.4.1/tarball/stage-lib-manifest.properties.sha1
-SL -o /tmp/sdc-setup-20988/stage-lib-manifest.properties.sha1 in /home/streamsets

[streamsets@server4 ~]$ export https_proxy='http://proxy_account:proxy_password@proxy_host:proxy_port'
[streamsets@server4 ~]$ echo $https_proxy
http://proxy_account:proxy_password@proxy_host:proxy_port
[streamsets@server4 streamsets]$ /streamsets/streamsets-datacollector-3.4.1/bin/streamsets stagelibs -list



StreamSets Data Collector

Stage Library Repository: https://archives.streamsets.com/datacollector/3.4.1/tarball

    ID                                                           Name                                     Installed
=================================================================================================================
 streamsets-datacollector-aerospike-lib                       Aerospike 3.15.0.2                           NO
 streamsets-datacollector-apache-kafka_0_10-lib               Apache Kafka 0.10.0.0                        NO
 streamsets-datacollector-apache-kafka_0_11-lib               Apache Kafka 0.11.0.0                        NO
 streamsets-datacollector-apache-kafka_0_9-lib                Apache Kafka 0.9.0.1                         NO
 streamsets-datacollector-apache-kafka_1_0-lib                Apache Kafka 1.0.0                           NO
 streamsets-datacollector-apache-kudu_1_3-lib                 Apache Kudu 1.3.0                            NO
 streamsets-datacollector-apache-kudu_1_4-lib                 Apache Kudu 1.4.0                            NO
.
.
 streamsets-datacollector-jdbc-lib                            JDBC                                         NO
.
.
[streamsets@server4 streamsets]$ /streamsets/streamsets-datacollector-3.4.1/bin/streamsets stagelibs -install=streamsets-datacollector-jdbc-lib




Downloading: https://archives.streamsets.com/datacollector/3.4.1/tarball/streamsets-datacollector-jdbc-lib-3.4.1.tgz
######################################################################## 100.0%

Stage library streamsets-datacollector-jdbc-lib installed

Relaunch Streamsets Data Collector and back to web interface I have now seen JDBC as possible library:

streamsets05
streamsets05

So imported JDBC driver:

streamsets06
streamsets06

You should be prompt to restart Streamsets Data Collector and see below screen:

streamsets07
streamsets07

StreamSets Data Collector source Oracle database configuration

This source multitenant Oracle database is common for the three scenario I have decided to test so you have to do it only once. On this source Oracle database you have to create a global account able to manage LogMiner (global because LogMiner is accessible from root pluggable database in a multitenant architecture):

SQL> alter session set container=cdb$root;

Session altered.

SQL> create user c##streamsets identified by "streamsets" container=all;

User created.

SQL> grant create session, alter session, set container, select any dictionary, logmining, execute_catalog_role TO c##streamsets container=all;

Grant succeeded.

SQL> alter session set container=pdb1;

Session altered.

Change source Oracle database to archivelog mode and activate default supplemental log:

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 1048575184 bytes
Fixed Size                  8903888 bytes
Variable Size             729808896 bytes
Database Buffers          301989888 bytes
Redo Buffers                7872512 bytes
Database mounted.
SQL> alter database archivelog;

Database altered.

SQL> alter database open;

Database altered.

SQL> select supplemental_log_data_min, supplemental_log_data_pk, supplemental_log_data_all from v$database;

SUPPLEME SUP SUP
-------- --- ---
NO       NO  NO

SQL> alter database add supplemental log data;

Database altered.

SQL> select supplemental_log_data_min, supplemental_log_data_pk, supplemental_log_data_all from v$database;

SUPPLEME SUP SUP
-------- --- ---
YES      NO  NO

SQL> alter system switch logfile;

System altered.

I have also created a test schema on my pluggable database (pdb1) to handle my test table:

SQL> create user test01 identified by test01;

User created.

SQL> grant connect,resource to test01;

Grant succeeded.

SQL> alter user test01 quota unlimited on users;

User altered.

Create a test table and add supplemental log:

SQL> create table test01.table01 (
  id number not null,
  descr varchar2(50),
  constraint table01_pk primary key (id) enable 
);

Table created.

SQL> alter table test01.table01 add supplemental log data (primary key) columns;

Table altered.

SQL> set lines 200 pages 1000
SQL> col table_name for a15
SQL> col log_group_name for a15
SQL> col owner for a15
SQL> select * from dba_log_groups where owner='TEST01';

OWNER           LOG_GROUP_NAME  TABLE_NAME      LOG_GROUP_TYPE      ALWAYS      GENERATED
--------------- --------------- --------------- ------------------- ----------- --------------
TEST01          SYS_C007365     TABLE01         PRIMARY KEY LOGGING ALWAYS      GENERATED NAME

And insert few rows in it to simulate an already existing environment:

SQL> insert into test01.table01 values(1,'One');

1 row created.

SQL> insert into test01.table01 values(2,'Two');

1 row created.

SQL> commit;

Commit complete.

On source database generate a dictionary in redo log. If you choose to set “Dictionary Source” to Online Catalog then this step is not mandatory. It is also much faster on small ressources to use from Online Catalog so really up to you:

SQL> alter session set container=cdb$root;

Session altered.

SQL> execute dbms_logmnr_d.build(options=> dbms_logmnr_d.store_in_redo_logs);

PL/SQL procedure successfully completed.

SQL> col name for a60
SQL> set lines 200 pages 1000
SQL> select name,dictionary_begin,dictionary_end from v$archived_log where name is not null order by recid desc;

NAME                                                         DIC DIC
------------------------------------------------------------ --- ---
/u01/app/oracle/oradata/ORCL/arch/1_135_983097959.dbf        YES YES
/u01/app/oracle/oradata/ORCL/arch/1_134_983097959.dbf        NO  NO
/u01/app/oracle/oradata/ORCL/arch/1_133_983097959.dbf        NO  NO
/u01/app/oracle/oradata/ORCL/arch/1_132_983097959.dbf        NO  NO
/u01/app/oracle/oradata/ORCL/arch/1_131_983097959.dbf        NO  NO
/u01/app/oracle/oradata/ORCL/arch/1_130_983097959.dbf        NO  NO
/u01/app/oracle/oradata/ORCL/arch/1_129_983097959.dbf        NO  NO
/u01/app/oracle/oradata/ORCL/arch/1_128_983097959.dbf        YES YES
/u01/app/oracle/oradata/ORCL/arch/1_127_983097959.dbf        NO  NO
/u01/app/oracle/oradata/ORCL/arch/1_126_983097959.dbf        NO  NO

StreamSets Data Collector Oracle to Oracle replication

Oracle prerequisites

On target Oracle database I have created an account in my target pluggable database (pdb1):>/p>

SQL> create user test01 identified by test01;

User created.

SQL> grant connect,resource to test01;

Grant succeeded.

SQL> alter user test01 quota unlimited on users;

User altered.

On source database create an export directory and grant read and write on it to test01 account:

SQL> alter session set container=pdb1;

Session altered.

SQL> create or replace directory tmp as '/tmp';

Directory created.

SQL> grant read,write on directory tmp to test01;

Grant succeeded.

Get the current System Change Number (SCN) on source database with:

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
    5424515

Finally export the figures with:

[oracle@server1 ~]$ expdp test01/test01@pdb1 dumpfile=table01.dmp directory=tmp tables=table01 flashback_scn=5424515

Export: Release 18.0.0.0.0 - Production on Wed Sep 5 13:02:23 2018
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Starting "TEST01"."SYS_EXPORT_TABLE_01":  test01/********@pdb1 dumpfile=table01.dmp directory=tmp tables=table01 flashback_scn=5424515
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Processing object type TABLE_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/GRANT/OWNER_GRANT/OBJECT_GRANT
Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
. . exported "TEST01"."TABLE01"                          5.492 KB       2 rows
Master table "TEST01"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for TEST01.SYS_EXPORT_TABLE_01 is:
  /tmp/table01.dmp
Job "TEST01"."SYS_EXPORT_TABLE_01" successfully completed at Wed Sep 5 13:04:25 2018 elapsed 0 00:01:09

[oracle@server1 ~]$ ll /tmp/table01.dmp
-rw-r----- 1 oracle dba 200704 Sep  5 13:04 /tmp/table01.dmp
[oracle@server1 ~]$ scp /tmp/table01.dmp server2.domain.com:/tmp
The authenticity of host 'server2.domain.com (192.168.56.102)' can't be established.
ECDSA key fingerprint is SHA256:hduqTIePPHF3Y+N/ekuZKnnXbocm+PNS7yU/HCf1GEw.
ECDSA key fingerprint is MD5:13:dc:e3:27:bc:4b:08:b8:bf:53:2a:15:3c:86:d7:c4.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'server2.domain.com' (ECDSA) to the list of known hosts.
oracle@server2.domain.com's password:
table01.dmp                                                                      100%  196KB  19.0MB/s   00:00                    

Import the figures on target database with something like (in pdb1 pluggable database):

[oracle@server2 ~]$ impdp test01/test01@pdb1 file=table01.dmp directory=tmp

Import: Release 18.0.0.0.0 - Production on Wed Sep 5 13:06:46 2018
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Legacy Mode Active due to the following parameters:
Legacy Mode Parameter: "file=table01.dmp" Location: Command Line, Replaced with: "dumpfile=table01.dmp"
Master table "TEST01"."SYS_IMPORT_FULL_01" successfully loaded/unloaded
Starting "TEST01"."SYS_IMPORT_FULL_01":  test01/********@pdb1 dumpfile=table01.dmp directory=tmp
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "TEST01"."TABLE01"                          5.492 KB       2 rows
Processing object type TABLE_EXPORT/TABLE/GRANT/OWNER_GRANT/OBJECT_GRANT
ORA-39083: Object type OBJECT_GRANT failed to create with error:
ORA-01917: user or role 'C##STREAMSETS' does not exist

Failing sql is:
GRANT SELECT ON "TEST01"."TABLE01" TO "C##STREAMSETS"

Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Processing object type TABLE_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
Job "TEST01"."SYS_IMPORT_FULL_01" completed with 1 error(s) at Wed Sep 5 13:07:47 2018 elapsed 0 00:00:46

StreamSets Data Collector configuration

Create the pipeline:

streamsets08
streamsets08

Choose Oracle CDC Client as a source (from replicating from Oracle this is the de facto option to choose):

streamsets09
streamsets09

Configure all parameters. I have chosen the most reliable solution to get dictionary from redo in case we want to test schema change (DDL). As the simplest test I have chosen From Latest Change – Processes changes that arrive after you start the pipeline. I will re-configure it after the database settings:

streamsets10
streamsets10

Below query can help you to choose correct database time zone:

SQL> SELECT DBTIMEZONE FROM DUAL;

DBTIME
------
+00:00

Set JDBC connection string (I am in a multitenant configuration):

streamsets11
streamsets11

Use the global account we have define earlier:

streamsets12
streamsets12

Not defining a Processor to modify figures between source and target but obviously this is possible:

streamsets13
streamsets13

Set JDBC connection string. I am in a multitenant configuration but here I just connect to the pluggable destination database directly:

streamsets14
streamsets14

Local pluggable database user to connect with:

streamsets15
streamsets15

In Streamsets configuration I change Oracle CDC Client configuration to instruct to start at a particular SCN (the one we have extracted above):

streamsets16
streamsets16

StreamSets Data Collector Oracle to Oracle replication testing

Then on source database I can start inserting new figures:

SQL> alter session set container=pdb1;

Session altered.

SQL> insert into test01.table01 values(3,'Three');

1 row created.

SQL> commit;

Commit complete.

And you should see them on target database as well as having a nice monitoring screen of the pipeline:

streamsets17
streamsets17

Then to generate a bit of traffic I have used below PL/SQL script (number of inserted rows is up to you and I have personally done multiple test):

declare
  max_id number;
  i number;
  inserted_rows number:=10000;
begin
  select max(id) into max_id from test01.table01;
  i:=max_id+1;
  loop
    insert into test01.table01 values(i,dbms_random.string('U', 20));
    commit;
    i:=i+1;
    exit when i>max_id + inserted_rows;
  end loop;
end;
/

And if you capture the monitoring screen while it’s running you should be able to see transfer rate figures:

streamsets18
streamsets18

StreamSets Data Collector Oracle to JSON file generation

In this extra testing I wanted to test the capability on top of JDBC insertion in a target Oracle database the capability to generate a JSON file. I have started by adding a new destination called Local FS and then draw with the mouse in the GUI interface a new line between the Oracle CDC Client and the Local FS. The only parameter I have changed is generated Data Format as classical JSON:

streamsets19
streamsets19

Once I insert a row in source table (I have restarted from an empty table) the record is duplicated in:

  • The same target Oracle database, same as above.
  • A text file, on Streamsets Data Collector server, in JSON format.

We can see the output generated records is equal to two:

streamsets20
streamsets20

The output file (located on server where Streamsets Data Collector is running i.e. server4.domain.com):

[root@server4 ~]# cat /tmp/out/2018-09-11-15/_tmp_sdc-2622d297-ac69-11e8-bf06-e301dabcb2ba_0
{"ID":1,"DESCR":"One"}

StreamSets Data Collector Oracle to MySQL replication

I have obviously created a small MySQL 8 instance. I have used my personal account to connect to it and created a test01 database to map schema name of source Oracle pluggable database:

mysql> create user  'yjaquier'@'%' identified by 'secure_password';
Query OK, 0 rows affected (0.31 sec)

mysql>  grant all privileges on *.* to 'yjaquier'@'%' with grant option;
Query OK, 0 rows affected (0.33 sec)
mysql> create database if not exists test01
    -> CHARACTER SET = utf32
    -> COLLATE = utf32_general_ci;
Query OK, 1 row affected (0.59 sec)

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test01             |
+--------------------+
5 rows in set (0.00 sec)

I create the target table same as the one of source Oracle table:

mysql> create table test01.table01 (
    -> id int not null,
    -> descr varchar(50) null,
    -> primary key (id));
Query OK, 0 rows affected (1.30 sec)

Remark:
I have started from an empty table, but if it’s not the case then an export and an import of pre-existing figures should be handle…

The JDBC connect string for MySQL is (3322 is my MySQL listening port):

jdbc:mysql://server2.domain.com:3322

I have also set two below JDBC parameters (see Errors encountered section):

streamsets21
streamsets21

The Schema Name parameter for MySQL must be inserted in lower case so test01 in my case and Table Name must also be in lowercase so use below formula (str:toLower function) to convert uppercase Oracle table name to lower case:

${str:toLower(record:attribute('oracle.cdc.table'))}
streamsets22
streamsets22

Finally records have also been inserted in MySQL target table:

mysql> select * from test01.table01;
+----+-------+
| id | descr |
+----+-------+
|  1 | One   |
|  2 | Two   |
|  3 | Three |
+----+-------+
3 rows in set (0.00 sec)

Errors encountered

JDBC_52 – Error starting LogMiner

In sdc.log or in View Logs of interface file you should see something like:

LOGMINER - CONTINUOUS_MINE  - failed to add logfile /u01/app/oracle/oradata/ORCL/arch/1_4_983097959.dbf because of status 1284
2018-08-07T17:19:02.844615+02:00

It was a mistake from my side where I have deleted archived log file directly on disk. Recovered the situation with:

RMAN> crosscheck archivelog all;

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=277 device type=DISK
validation failed for archived log
archived log file name=/u01/app/oracle/oradata/ORCL/arch/1_4_983097959.dbf RECID=1 STAMP=983191889
validation failed for archived log
.
.

RMAN> list archivelog all;

using target database control file instead of recovery catalog
List of Archived Log Copies for database with db_unique_name ORCL
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - ---------
1       1    4       X 02-AUG-18
        Name: /u01/app/oracle/oradata/ORCL/arch/1_4_983097959.dbf
.
.

RMAN> delete noprompt expired archivelog all;

allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=271 device type=DISK
List of Archived Log Copies for database with db_unique_name ORCL
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - ---------
1       1    4       X 02-AUG-18
        Name: /u01/app/oracle/oradata/ORCL/arch/1_4_983097959.dbf

I also had the brother of above one with:

JDBC_44 - Error while getting changes due to error: com.streamsets.pipeline.api.StageException: JDBC_52 - Error starting LogMiner

It was simply because no archived log file was containing a dictionary log. This can happen when you purge archived log files. Generate one with:

SQL> execute dbms_logmnr_d.build(options=> dbms_logmnr_d.store_in_redo_logs);

PL/SQL procedure successfully completed.

JDBC_44 – Error while getting changes due to error: java.sql.SQLRecoverableException: Closed Connection: getBigDecimal

When validating pipeline I got:

JDBC_44 - Error while getting changes due to error: java.sql.SQLRecoverableException: Closed Connection: getBigDecimal

And found in sdc.log file below error:

2018-08-31 10:53:35,754 [user:*admin] [pipeline:Oracle-to-Oracle/OracletoOracle7593b814-1185-4829-9fe5-6247106856c0] [runner:] [thread:preview-pool-1-thread-4] WARN  OracleCDCSource - Error while stopping LogMiner
java.sql.SQLRecoverableException: Closed Connection

From what I have seen around it looks like my test server is too slow and if you get this you might need to increase the timeout parameters in Advanced tab of Oracle CDC client…

ORA-01291: missing logfile

This one has kept me busy for a while:

106856c0-Oracle-to-Oracle] INFO  JdbcUtil - Driver class oracle.jdbc.OracleDriver (version 18.3)
2018-09-03 17:50:48,069 [user:*admin] [pipeline:Oracle-to-Oracle/OracletoOracle7593b814-1185-4829-9fe5-6247106856c0] [runner:0] [thread:ProductionPipelineRunnable-OracletoOracle7593b814-1185-4829-9fe5-6247
106856c0-Oracle-to-Oracle] INFO  HikariDataSource - HikariPool-1 - is starting.
2018-09-03 17:50:49,354 [user:*admin] [pipeline:Oracle-to-Oracle/OracletoOracle7593b814-1185-4829-9fe5-6247106856c0] [runner:] [thread:ProductionPipelineRunnable-OracletoOracle7593b814-1185-4829-9fe5-62471
06856c0-Oracle-to-Oracle] INFO  OracleCDCSource - Trying to start LogMiner with start date: 31-08-2018 10:40:33 and end date: 31-08-2018 12:40:33
2018-09-03 17:50:49,908 [user:*admin] [pipeline:Oracle-to-Oracle/OracletoOracle7593b814-1185-4829-9fe5-6247106856c0] [runner:] [thread:ProductionPipelineRunnable-OracletoOracle7593b814-1185-4829-9fe5-62471
06856c0-Oracle-to-Oracle] ERROR OracleCDCSource - SQLException while trying to setup record generator thread
java.sql.SQLException: ORA-01291: missing logfile
ORA-06512: at "SYS.DBMS_LOGMNR", line 58
ORA-06512: at line 1

        at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:494)
        at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:446)
        at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1052)
        at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:537)
        at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:255)
        at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:610)
        at oracle.jdbc.driver.T4CCallableStatement.doOall8(T4CCallableStatement.java:249)
        at oracle.jdbc.driver.T4CCallableStatement.doOall8(T4CCallableStatement.java:82)
        at oracle.jdbc.driver.T4CCallableStatement.executeForRows(T4CCallableStatement.java:924)
        at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1136)
        at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.

One of the reason I have identified is because Streamsets start LogMiner two hours in the past even when you choose from latest changes:

2018-08-31 12:53:00,846 [user:*admin] [pipeline:Oracle-to-Oracle/OracletoOracle7593b814-1185-4829-9fe5-6247106856c0] [runner:] [thread:ProductionPipelineRunnable-OracletoOracle7593b814-1185-4829-9fe5-62471
06856c0-Oracle-to-Oracle] INFO  OracleCDCSource - Trying to start LogMiner with start date: 31-08-2018 10:40:33 and end date: 31-08-2018 12:40:33

I suspect it occurs because Oracle CDC Client LogMiner Session Window parameter is set to 2 hours you must have the archived log files from last 2 hours available when starting the pipeline. So never ever purge archivelog file with:

RMAN> delete noprompt archivelog all;

But use:

RMAN> delete noprompt archivelog all completed before 'sysdate-3/24';

But even when applying this I also noticed Streamsets was always starting LogMiner from the time where the pipeline has failed or when you have stopped it. This is saved in offset.json file:

[root@server4 0]# pwd
/streamsets/streamsets-datacollector-3.4.1/data/runInfo/OracletoOracle7593b814-1185-4829-9fe5-6247106856c0/0
[root@server4 0]# ll
total 472
-rw-r--r-- 1 streamsets users    100 Sep  3 18:20 offset.json
-rw-r--r-- 1 streamsets users 257637 Sep  3 18:20 pipelineStateHistory.json
-rw-r--r-- 1 streamsets users  20316 Sep  3 18:20 pipelineState.json
[root@server4 0]# cat offset.json
{
  "version" : 2,
  "offsets" : {
    "$com.streamsets.datacollector.pollsource.offset$" : "v3::1535715633::3661001::1"
  }
}

If this is expected and you know what you are doing (first setup, testing, ..) you can reset the pipeline with graphical interface:

streamsets23
streamsets23

Confirm you will not capture what happened in meanwhile:

streamsets24
streamsets24

Which emtpy offset.json file:

[root@server4 0]# cat offset.json
{
  "version" : 2,
  "offsets" : { }
}

JDBC_16 – Table ” does not exist or PDB is incorrect. Make sure the correct PDB was specified

In JDBC Producer replace for Table Name field:

${record:attribute('tablename')}

By

${record:attribute('oracle.cdc.table')}

Then it failed for:

JDBC_16 - Table 'TABLE01' does not exist or PDB is incorrect. Make sure the correct PDB was specified

In JDBC Producer Errors section I noticed:

oracle.cdc.user: SYS

Because I inserted the record on master database with SYS account, tried with TEST01 account but failed for exact same error…

Finally found the solution when setting Schema Name field to TEST01, in uppercase, because as suggested in Oracle CDC Client documentation Oracle uses all caps for schema, table, and column names by default.

JDBC_00 – Cannot connect to specified database: com.zaxxer.hikari.pool.PoolInitializationException: Exception during pool initialization: The server time zone value ‘CEST’ is unrecognized or represents more than one time zone.

The full error message also contains:

You must configure either the server or JDBC driver (via the serverTimezone configuration property) to use a more specific time zone value if you want to utilize time zone support.

I have been obliged to add an additional JDBC property to set server time zone:

serverTimezone = UTC

Establishing SSL connection without server’s identity verification is not recommended

Thu Sep 13 09:53:39 CEST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.

To solve this set in JDBC driver parameters:

useSSL = false

JdbcGenericRecordWriter – No parameters found for record with ID

Complete error message is:

2018-09-12 15:26:41,696 [user:*admin] [pipeline:Oracle-to-Oracle/OracletoOracle7593b814-1185-4829-9fe5-6247106856c0] [runner:0] [thread:ProductionPipelineRunnable-OracletoOracle7593b814-1185-4829-9fe5-6247
106856c0-Oracle-to-Oracle] WARN  JdbcGenericRecordWriter - No parameters found for record with ID  0x0000bc.0001bec8.0018 ::0; skipping
2018-09-12 15:41:31,650 [user:*admin] [pipeline:Oracle-to-Oracle/OracletoOracle7593b814-1185-4829-9fe5-6247106856c0] [runner:0] [thread:ProductionPipelineRunnable-OracletoOracle7593b814-1185-4829-9fe5-6247
106856c0-Oracle-to-Oracle] WARN  JdbcGenericRecordWriter - No parameters found for record with ID  0x0000bd.000057e7.0010 ::1; skipping

For Oracle to MySQL replication I had to manually do column mapping explicitly like this:

streamsets22
streamsets22

References

The post StreamSets Data Collector replication with Oracle, MySQL and JSON appeared first on IT World.

]]>
https://blog.yannickjaquier.com/oracle/streamsets-data-collector-oracle-cdc-client.html/feed 0
Active Session History visualization with Matplotlib and Altair https://blog.yannickjaquier.com/python/active-session-history-visualization-with-matplotlib-and-altair.html https://blog.yannickjaquier.com/python/active-session-history-visualization-with-matplotlib-and-altair.html#respond Wed, 13 Mar 2019 15:38:05 +0000 https://blog.yannickjaquier.com/?p=4469 Preamble After a previous post to install Jupyter Lab and displaying charts suggested by Dominic Gilles it’s now time to move a bit further. Let’s be honest what I REALLY wanted to display is a Cloud Control performance like chart. Means an Active Sessions History visualization in python using one of the many available graphical […]

The post Active Session History visualization with Matplotlib and Altair appeared first on IT World.

]]>

Table of contents

Preamble

After a previous post to install Jupyter Lab and displaying charts suggested by Dominic Gilles it’s now time to move a bit further. Let’s be honest what I REALLY wanted to display is a Cloud Control performance like chart. Means an Active Sessions History visualization in python using one of the many available graphical libraries !

Of course any other performance charts are possible and if you have the query then display them in Jupyter Lab should be relatively straightforward…

I have initially decided to continue with Altair with the ultimate goal to do the same with leading Python graphical library called Matplotlib. At a point in time I expected to use Seaborn as an high level wrapper for Matplotlib but area charts have not been implemented at the time of writing this post (!).

Active Session History visualization with Altair

It all start with a Python query like:

%%sql result1 <<
SELECT
TRUNC(sample_time,'MI') AS sample_time,
DECODE(NVL(wait_class,'ON CPU'),'ON CPU',DECODE(session_type,'BACKGROUND','BCPU','CPU'),wait_class) AS wait_class,
COUNT(*)/60 AS nb
FROM v$active_session_history
WHERE sample_time>=TRUNC(sysdate-interval '1' hour,'MI')
AND sample_time

But my first try gave below error:

result1_df = result1.DataFrame()
alt.Chart(result1_df).mark_area().encode(
    x='sample_time:T',
    y='nb:Q',
    color='wait_class'
).properties(width=700,height=400)

ValueError: Can't clean for JSON: Decimal('0.5833333333333333333333333333333333333333')

I have been able to understand why using below commands. The nb column is seen as an object not a decimal:

>>> result1_df.info()


RangeIndex: 444 entries, 0 to 443
Data columns (total 3 columns):
sample_time    444 non-null datetime64[ns]
wait_class     444 non-null object
nb             444 non-null object
dtypes: datetime64[ns](1), object(2)
memory usage: 10.5+ KB

>>> result1_df.dtypes

sample_time    datetime64[ns]
wait_class             object
nb                    float64
dtype: object

So I converted this column using:

result1_df[['nb']]=result1_df[['nb']].astype('float')

I have set Cloud Control colors using:

colors = alt.Scale(domain=['Other','Cluster','Queueing','Network','Administrative','Configuration','Commit',
                           'Application','Concurrency','System I/O','User I/O','Scheduler','CPU Wait','BCPU','CPU'],
                   range=['#FF69B4','#F5DEB3','#D2B48C','#BC8F8F','#708090','#800000','#FF7F50','#DC143C','#B22222',
                          '#1E90FF','#0000FF','#90EE90','#9ACD32','#3CB371','#32CD32'])

And set axis and series title using something like:

alt.Chart(result1_df).mark_area().encode(
    x=alt.X('sample_time:T', axis=alt.Axis(title='Time')),
    y=alt.Y('nb:Q', axis=alt.Axis(title='Average Active Sessions')),
    color=alt.Color('wait_class', legend=alt.Legend(title='Wait Class'), scale=colors)
).properties(width=700,height=400)

Which gives:

ash_python01
ash_python01

Or without time limitation (update the initial query to do so):

ash_python02
ash_python02

This is overall very easy to display interesting figures and I have to say that it is much much less work than doing it with Highchart and JavaScript.

You might have noticed that versus the Visualizing Active Session History (ASH) to produce Grid Control charts article I'm still lacking the CPU Wait figures. We have seen that query is something like (that I put in a different pandas dataframe):

%%sql result2 <<
SELECT
TRUNC(begin_time,'MI') AS sample_time,
'CPU_ORA_CONSUMED' AS wait_class,
value/100 AS nb
FROM v$sysmetric_history
WHERE group_id=2
AND metric_name='CPU Usage Per Sec'
AND begin_time>=TRUNC(sysdate-interval '1' hour,'MI')
AND begin_time

CPU Wait value is CPU plus background CPU subtracted by CPU used by Oracle, and only if value is positive... The idea is to create from scratch the result dataframe and add it to result1_df database we have seen above.

Creation can be done with:

result3_df=pd.DataFrame(pd.date_range(start=result1_df['sample_time'].min(), end=result1_df['sample_time'].max(), freq='T'),
             columns=['sample_time'])
result3_df['wait_class']='CPU Wait'
result3_df['nb']=float(0)

Or:

result3_df=pd.DataFrame({'sample_time': pd.Series(pd.date_range(start=result1_df['sample_time'].min(), 
                                                                end=result1_df['sample_time'].max(), freq='T')),
                         'wait_class': 'CPU Wait',
                         'nb': float(0)})

Then the computation is done with below code. Of course you must handle the fact that sometime there is no value for CPU or BCPU:

for i in range(1,int(result3_df['sample_time'].count())+1):
    time=result3_df['sample_time'][i-1]
    #result=float((result1_df.query('wait_class == "CPU" and sample_time==@time').fillna(0))['nb'])
    cpu=0
    bcpu=0
    cpu_ora_consumed=0
    #result1=result1_df.loc[(result1_df['wait_class']=='CPU') | (result1_df['wait_class']=='BCPU')]['nb']
    result1=(result1_df.query('wait_class == "CPU" and sample_time==@time'))['nb']
    result2=(result1_df.query('wait_class == "BCPU" and sample_time==@time'))['nb']
    result3=(result2_df.query('wait_class == "CPU_ORA_CONSUMED" and sample_time==@time'))['nb']
    if not(result1.empty):
        cpu=float(result1)
    if not(result2.empty):
        bcpu=float(result2)
    if not(result3.empty):
        cpu_ora_consumed=float(result3)
    cpu_wait=cpu+bcpu-cpu_ora_consumed
    #print('{:d} - {:f},{:f},{:f} - {:f}'.format(i,cpu,bcpu,cpu_ora_consumed,cpu_wait))
    if cpu_wait<0:
        cpu_wait=0.0
    #result3_df['nb'][i-1]=cpu_wait
    result3_df.loc[i:i,('nb')]=cpu_wait
print('Done')

For performance reason and to avoid a warning you cannot use what's below to set the value:

result3_df['nb'][i-1]=cpu_wait

Or you get:

/root/python37_venv/lib64/python3.7/site-packages/ipykernel_launcher.py:24: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Finally I have decided to concatenate result1_df dataframe to resul3_df dataframe and sort it on sample_time (if you do not sort it Altair might produce strange result):

result3_df=pd.concat([result1_df, result3_df])
result3_df=result3_df.sort_values(by='sample_time')

To get final chart with CPU Wait figures. I have also added a tooltip to display wait class and value (but at the end I believe the feature is a bit buggy, sample_time is always set at same value in tooltip, or I don't know how to use it):

colors = alt.Scale(domain=['Other','Cluster','Queueing','Network','Administrative','Configuration','Commit',
                           'Application','Concurrency','System I/O','User I/O','Scheduler','CPU Wait','BCPU','CPU'],
                   range=['#FF69B4','#F5DEB3','#D2B48C','#BC8F8F','#708090','#800000','#FF7F50','#DC143C','#B22222',
                          '#1E90FF','#0000FF','#90EE90','#9ACD32','#3CB371','#32CD32'])
alt.Chart(result3_df).mark_area().encode(
    x=alt.X('sample_time:T', axis=alt.Axis(format='%d-%b-%Y %H:%M', title='Time')),
    y=alt.Y('nb:Q', axis=alt.Axis(title='Average Active Sessions')),
    color=alt.Color('wait_class', legend=alt.Legend(title='Wait Class'), scale=colors),
    tooltip=['wait_class',alt.Tooltip(field='sample_time',title='Time',type='temporal',format="%d-%b-%Y %H:%M"),alt.Tooltip('nb',format='.3')]
).properties(width=700,height=400)
ash_python03
ash_python03

Are we done ? Almost... If you look closely to a Cloud Control Active Session History (ASH) chart you will see that the stack order of the wait class is not random nor ordered by wait class name. First come CPU, then Scheduler then User I/O and so on... So how to do that with Altair ? Well at the time of writing this blog post it is not possible. The feature request has been validated to be added and they alternatively propose a trick using calculate.

You can also use the order property of the stack chart but obviously it will order wait class by name (default behavior by the way). You can simply change the order with something like:

colors = alt.Scale(domain=['Other','Cluster','Queueing','Network','Administrative','Configuration','Commit',
                           'Application','Concurrency','System I/O','User I/O','Scheduler','CPU Wait','BCPU','CPU'],
                   range=['#FF69B4','#F5DEB3','#D2B48C','#BC8F8F','#708090','#800000','#FF7F50','#DC143C','#B22222',
                          '#1E90FF','#0000FF','#90EE90','#9ACD32','#3CB371','#32CD32'])
alt.Chart(result3_df).mark_area().encode(
    x=alt.X('sample_time:T', axis=alt.Axis(format='%d-%b-%Y %H:%M', title='Time')),
    y=alt.Y('nb:Q', axis=alt.Axis(title='Average Active Sessions')),
    color=alt.Color('wait_class', legend=alt.Legend(title='Wait Class'), scale=colors),
    tooltip=['wait_class',alt.Tooltip(field='sample_time',title='Time',type='temporal',format="%d-%b-%Y %H:%M"),'nb'],
    order = {'field': 'wait_class', 'type': 'nominal', 'sort': 'descending'}
).properties(width=700,height=400)

To really sort the wait class by the order you like till they allow it simply in the encode function, as suggested, you have to use the transform_calculate function and to be honest I have fight a lot with Vega expression to make it working:

from altair import datum, expr
colors = alt.Scale(domain=['Other','Cluster','Queueing','Network','Administrative','Configuration','Commit',
                           'Application','Concurrency','System I/O','User I/O','Scheduler','CPU Wait','BCPU','CPU'],
                   range=['#FF69B4','#F5DEB3','#D2B48C','#BC8F8F','#708090','#800000','#FF7F50','#DC143C','#B22222',
                          '#1E90FF','#0000FF','#90EE90','#9ACD32','#3CB371','#32CD32'])
#kwds={'calculate': 'indexof(colors.domain, datum.wait_class)', 'as': 'areaorder' }
kwds={'calculate': "indexof(['Other','Cluster','Queueing','Network','Administrative','Configuration','Commit',\
      'Application','Concurrency','System I/O','User I/O','Scheduler','CPU Wait','BCPU','CPU'], datum.wait_class)",
      'as': "areaorder" }
alt.Chart(result3_df).mark_area().encode(
    x=alt.X('sample_time:T', axis=alt.Axis(format='%d-%b-%Y %H:%M', title='Time')),
    y=alt.Y('nb:Q', axis=alt.Axis(title='Average Active Sessions')),
    color=alt.Color('wait_class', legend=alt.Legend(title='Wait Class'), scale=colors),
    tooltip=['wait_class',alt.Tooltip(field='sample_time',title='Time',type='temporal',format="%d-%b-%Y %H:%M"),'nb']
    #,order='siteOrder:Q'
    ,order = {'field': 'areaorder', 'type': 'ordinal', 'sort': 'descending'}
).properties(width=700,height=400).transform_calculate(**kwds)
ash_python04
ash_python04

Here it is ! I have not been able to use the colors Altair scale variable in the computation (line in comment) so I have just copy/paste it...

Active Session History visualization with Matplotlib

Matplotlib claim to be the leading visualization library available in Python. With the well known drawback of being a bit complex to handle...

I will start from the result3_df we have just created above. Matplotlib is not exactly ingesting the same format as Altair and you are obliged like with Highchart to pivot your result. While I was asking me how to do I have touched the beauty of Pandas as it is already there with pivot function:

result3_df_pivot=result3_df.pivot(index='sample_time', columns='wait_class', values='nb').fillna(0)

I also use fillna to fill non existing value (NaN) as you do not have session waiting for all wait class at a given time.

Then you need to finally import Matplotlib (if not yet installed do a pip install matplotlib) and you need to configure few things. The minimum required is figure size that you have to specify in inches. If like me you live in a metric world you would wonder how do I specify size in pixels, for example. I have found below trick:

>>> import pylab
>>> pylab.gcf().get_dpi()
72.0

Then you can divide size in pixels by 72 and use it in matplotlib.rc procedure:

import matplotlib
# figure size in inches
matplotlib.rc('figure', figsize = (1000/72, 500/72))
# Font size to 14
matplotlib.rc('font', size = 14)
# Do not display top and right frame lines
#matplotlib.rc('axes.spines', top = False, right = False)
# Remove grid lines
#matplotlib.rc('axes', grid = False)

To get all possible parameters as well as their default values you can use:

>>> matplotlib.rc_params()
RcParams({'_internal.classic_mode': False,
          'agg.path.chunksize': 0,
          'animation.avconv_args': [],
          'animation.avconv_path': 'avconv',
          'animation.bitrate': -1,
          'animation.codec': 'h264',
.
.
.

You can also set a value using rcParams procedure and do not work with groups using rc procedure:

matplotlib.rcParams['figure.figsize']= [6.4, 4.8]

As the most simple example you can use the Matplotlib wrapper of Pandas and do a simple:

result3_df_pivot.plot.area()

Or if you want to save the figure in a file (on the machine where is running Jupyter Lab):

import matplotlib.pyplot as plt
plt.figure()
result3_df_pivot.plot.area()
plt.savefig('/tmp/visualization.png')

It gives the below not so bad result for lowest possible effort. Of course we have not chosen the colors and customize nothing but for a quick and dirty display it already gives a lot of information:

ash_python05
ash_python05

Of course we want more and for this we will have to enter in Matplotlib internals. The idea is to define a Pandas series with categories and related colors. Then build the data, colors and labels based of which wait class category we have. Finally a bit of cosmetic with axes label and chart label as well as label in top right corner:

from matplotlib.dates import DayLocator, HourLocator, DateFormatter, drange
colors_ref = pd.Series({
    'Other': '#FF69B4',
    'Cluster': '#F5DEB3',
    'Queueing': '#D2B48C',
    'Network': '#BC8F8F',
    'Administrative': '#708090',
    'Configuration': '#800000',
    'Commit': '#FF7F50',
    'Application': '#DC143C',
    'Concurrency': '#B22222',
    'System I/O': '#1E90FF',
    'User I/O': '#0000FF',
    'Scheduler': '#90EE90',
    'CPU Wait': '#9ACD32',
    'BCPU': '#3CB371',
    'CPU': '#32CD32'
})
data=[]
labels=[]
colors=[]
for key in ('CPU','BCPU','CPU Wait','Scheduler','User I/O','System I/O','Concurrency','Application','Commit',
            'Configuration','Administrative','Network','Queueing','Cluster','Other'):
    if key in result3_df_pivot.keys():
        data.append(result3_df_pivot[key].values)
        labels.append(key)
        colors.append(colors_ref[key])

figure, ax = plt.subplots()
plt.stackplot(result3_df_pivot.index.values,
              data,
              labels=labels,
              colors=colors)
# format the ticks
#ax.xaxis.set_major_locator(years)
#ax.xaxis.set_major_formatter(yearsFmt)
#ax.xaxis.set_minor_locator(months)
#ax.autoscale_view()
#ax.xaxis.set_major_locator(DayLocator())
#ax.xaxis.set_minor_locator(HourLocator(range(0, 25, 6)))
#ax.fmt_xdata = DateFormatter('%Y-%m-%d %H:%M:%S')
ax.xaxis.set_major_formatter(DateFormatter('%Y-%m-%d %H:%M:%S'))
figure.autofmt_xdate()
plt.legend(loc='upper right')
figure.suptitle('Cloud Control', fontsize=20)
plt.xlabel('Time', fontsize=16)
plt.ylabel('Average Active Sessions', fontsize=16)
plt.show()
figure.savefig('/tmp/visualization.png')

Looks really close to a default Cloud Control performance active session history (ASH) chart:

ash_python06
ash_python06

References

The post Active Session History visualization with Matplotlib and Altair appeared first on IT World.

]]> https://blog.yannickjaquier.com/python/active-session-history-visualization-with-matplotlib-and-altair.html/feed 0 Jupyter Lab installation on Fedora to access an Oracle database https://blog.yannickjaquier.com/python/jupyter-lab-installation-fedora-oracle.html https://blog.yannickjaquier.com/python/jupyter-lab-installation-fedora-oracle.html#comments Thu, 14 Feb 2019 15:09:18 +0000 https://blog.yannickjaquier.com/?p=4461 Preamble The first time I had a brief look to Python was 10-15 years ago when one of my colleague who got a late diploma (in the frame of his current job within a company) told me he had a class on Python. I told him that was dead technology and very soon the language […]

The post Jupyter Lab installation on Fedora to access an Oracle database appeared first on IT World.

]]>

Table of contents

Preamble

The first time I had a brief look to Python was 10-15 years ago when one of my colleague who got a late diploma (in the frame of his current job within a company) told me he had a class on Python. I told him that was dead technology and very soon the language will be replaced by something else. How wrong I was !!

With the recent rise of Hadoop to store huge amount of data on commodity hardware we have seen emergence of a new way of processing in a *clever* way this new huge amount of data. The new buzz words are Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL).

The traditional language of this new technologies and new way of working as well as processing statistical and financial operations is Python. So the recent rise of this very old language (1991) !!

I have started to learn it a bit using free Youtube video of Academind and even bought their course on Udemy. Needless to say that recent blog post of Dominic Gilles and release in June of Python 3.7 has finally decided me to give it a try. I wanted to have a focus on making graphics using Matplotlib and even if Dominic Gilles’ post is using Altair I will test it in a second post…

To be honest if, like me, you start from nothing your are at free steps from being able to use Dominic Gilles’ post as you will have to setup your environment first…

If you connect to your server where you have installed Python using a ssh client like Putty you will not been able to display any graphics. This is why IPython has been developed, based on it Jupyter Notebook has emerged to finally produce Jupyter Lab. As they say “IPython itself is focused on interactive Python, part of which is providing a Python kernel for Jupyter”.

This blog post has been written using server edition of Fedora release 28 (Twenty Eight), Fedora has always been known to provide latest packages of Linux world product as well as being one of the most well known Linux…

Goal is to setup Jupyter lab with Python 3.7. Dong this will provide a low footprint virtual machine that you can use to connect to any database server inside your company…

Python 3.7 installation

Installing Fedora 28 (64 bits server release) in VirtualBox is quite straightforward and if you don’t know how to start you will find plenty of article on how to do this on internet. So I assue you have a running Fedora virtual machine.

As expected latest 3.7 Python release is already available on Fedora:

[root@fedora1 ~]# dnf list python37
Last metadata expiration check: 0:26:26 ago on Mon 02 Jul 2018 11:20:14 AM CEST.
Available Packages
python37.i686                                                                                   3.7.0-0.20.rc1.fc28                                                                                  updates
python37.x86_64                                                                                 3.7.0-0.20.rc1.fc28                                                                                  updates

Install it with:

[root@fedora1 ~]# dnf -y install python37.x86_64
Last metadata expiration check: 0:26:38 ago on Mon 02 Jul 2018 11:20:14 AM CEST.
Dependencies resolved.
============================================================================================================================================================================================================
 Package                                                   Arch                                     Version                                                 Repository                                 Size
============================================================================================================================================================================================================
Installing:
 python37                                                  x86_64                                   3.7.0-0.20.rc1.fc28                                     updates                                    20 M
Installing dependencies:
 aajohan-comfortaa-fonts                                   noarch                                   3.001-2.fc28                                            fedora                                    147 k
 dwz                                                       x86_64                                   0.12-7.fc28                                             fedora                                    107 k
 fontconfig                                                x86_64                                   2.13.0-4.fc28                                           updates                                   253 k
 fontpackages-filesystem                                   noarch                                   1.44-21.fc28                                            fedora                                     15 k
 fpc-srpm-macros                                           noarch                                   1.1-4.fc28                                              fedora                                    7.5 k
 ghc-srpm-macros                                           noarch                                   1.4.2-7.fc28                                            fedora                                    8.2 k
 gnat-srpm-macros                                          noarch                                   4-5.fc28                                                fedora                                    8.8 k
 go-srpm-macros                                            noarch                                   2-16.fc28                                               fedora                                     13 k
 libX11                                                    x86_64                                   1.6.5-7.fc28                                            fedora                                    622 k
 libX11-common                                             noarch                                   1.6.5-7.fc28                                            fedora                                    167 k
 libXau                                                    x86_64                                   1.0.8-11.fc28                                           fedora                                     34 k
 libXft                                                    x86_64                                   2.3.2-8.fc28                                            fedora                                     65 k
 libXrender                                                x86_64                                   0.9.10-5.fc28                                           fedora                                     32 k
 libxcb                                                    x86_64                                   1.13-1.fc28                                             fedora                                    228 k
 nim-srpm-macros                                           noarch                                   1-1.fc28                                                fedora                                    7.6 k
 ocaml-srpm-macros                                         noarch                                   5-2.fc27                                                fedora                                    7.8 k
 openblas-srpm-macros                                      noarch                                   2-2.fc27                                                fedora                                    6.6 k
 perl-srpm-macros                                          noarch                                   1-25.fc28                                               fedora                                    9.7 k
 python-srpm-macros                                        noarch                                   3-29.fc28                                               updates                                    11 k
 qt5-srpm-macros                                           noarch                                   5.10.1-1.fc28                                           fedora                                    9.6 k
 redhat-rpm-config                                         noarch                                   108-1.fc28                                              updates                                    77 k
 rust-srpm-macros                                          noarch                                   5-2.fc28                                                fedora                                    8.1 k
 tcl                                                       x86_64                                   1:8.6.8-1.fc28                                          fedora                                    1.1 M
 tk                                                        x86_64                                   1:8.6.8-1.fc28                                          fedora                                    1.6 M

Transaction Summary
============================================================================================================================================================================================================
Install  25 Packages

Total download size: 24 M
Installed size: 98 M

To specifically execute it you have to specify the complete name:

[root@fedora1 ~]# python3.7
Python 3.7.0rc1 (default, Jun 12 2018, 12:42:02)
[GCC 8.1.1 20180502 (Red Hat 8.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

For legacy reason the default Python of Fedora/RedHat is still Python 2.7.5, this is going to change in Fedora 29 but till that time the best way I have seen is to make what they call a virtual environment.

Virtual environment creation is as simple as, and then Python 3.7 is default Python in your virtual environment:

[root@fedora1 ~]# python3.7 -m venv python37_venv
[root@fedora1 ~]# source python37_venv/bin/activate
(python37_venv) [root@fedora1 ~]# python
Python 3.7.0rc1 (default, Jun 12 2018, 12:42:02)
[GCC 8.1.1 20180502 (Red Hat 8.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Jupyter Lab installation

After Python 3.7 has been installed the standard tool to install Python packages is pip. First try failed for an obvious company proxy configuration:

(python37_venv) [root@fedora1 ~]# pip install jupyterlab
Collecting jupyter
  Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/jupyter/
  Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/jupyter/
  Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/jupyter/
  Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/jupyter/
  Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/jupyter/
  Could not find a version that satisfies the requirement jupyter (from versions: )
No matching distribution found for jupyter

Which I solved with a simple:

(python37_venv) [root@fedora1 ~]# export https_proxy='http://proxy_account:proxy_password@proxy_server:proxy_port'
(python37_venv) [root@fedora1 ~]# echo $https_proxy
http://proxy_account:proxy_password@proxy_server:proxy_port

Second try failed for a https certificate verification (due to my company proxy architecture):

(python37_venv) [root@fedora1 ~]# pip install jupyterlab
Collecting jupyter
  Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1045)'))': /simple/jupyter/
  Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1045)'))': /simple/jupyter/
  Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1045)'))': /simple/jupyter/
  Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1045)'))': /simple/jupyter/
  Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1045)'))': /simple/jupyter/
  Could not fetch URL https://pypi.org/simple/jupyter/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/jupyter/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1045)'))) - skipping
  Could not find a version that satisfies the requirement jupyter (from versions: )
No matching distribution found for jupyter
Could not fetch URL https://pypi.org/simple/pip/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/pip/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1045)'))) - skipping

Which I solved by creating /etc/pip.conf and inserting my list of trusted hosts:/

(python37_venv) [root@fedora1 ~]# cat /etc/pip.conf
[global]
trusted-host = pypi.python.org
               pypi.org
               files.pythonhosted.org

Finally installation of Jupyter Lab wen well this time but launching it with jupyter-lab command failed for:

ImportError: libzmq.so.5: cannot open shared object file: No such file or directory

Which can be solved with:

(python37_venv) [root@fedora1 ~]# dnf -y install zeromq-devel.x86_64

When executing Jupyter Lab as root it failed for:

[C 14:13:49.036 LabApp] Running as root is not recommended. Use --allow-root to bypass.

We know that launching tools as root is never a good idea but if you want to bypass it generate a config file with:

(python37_venv) [root@fedora1 ~]# jupyter-lab --generate-config
Writing default config to: /root/.jupyter/jupyter_notebook_config.py

In generated file I also changed allowable IP to connect to and default listening IP:

c.NotebookApp.allow_origin = '*'
c.NotebookApp.allow_root = True
c.NotebookApp.ip = '192.168.56.105'

Deactivate SELinux (/etc/selinux/config configuration file to modify) and stop/disable firewall:

(python37_venv) [root@fedora1 ~]# systemctl stop firewalld
(python37_venv) [root@fedora1 ~]# systemctl disable firewalld

Run command should now work and you can access Jupyter Lab using provided url of the run command:

jupyter_lab01
jupyter_lab01

Then I wanted to install cx_Oracle Python package to access Oracle database but it failed for:

error: command 'gcc' failed with exit status 1

Which can be solved by installing gcc compiler:

(python37_venv) [root@fedora1 ~]# dnf -y install gcc

I have also installed below Python packaged:

  • cx_Oracle
  • numpy
  • altair
  • ipython-sql

Jupyter Lab testing

The first error I had in Jupyter Lab is:

(cx_Oracle.DatabaseError) DPI-1047: 64-bit Oracle Client library cannot be loaded: "libclntsh.so: cannot open shared object file: No such file or directory". See https://oracle.github.io/odpi/doc/installation.html#linux for help (Background on this error at: http://sqlalche.me/e/4xp6)
Connection info needed in SQLAlchemy format, example:
               postgresql://username:password@hostname/dbname
               or an existing connection: dict_keys([])

Obviously you have to install the Oracle instant client (or a normal client). I have chosen the RPM download and installed oracle-instantclient12.2-basic-12.2.0.1.0-1.x86_64.rpm RPM. The recommended configuration did not help:

sudo sh -c "echo /usr/lib/oracle/12.2/client64/lib > /etc/ld.so.conf.d/oracle-instantclient.conf"
sudo ldconfig

So I had to create the symbolic link in Oracle instant client directory:

[root@fedora1 lib]# ln -s libclntsh.so.12.1 libclntsh.so

then I had below error:

(cx_Oracle.DatabaseError) ORA-01017: invalid username/password; logon denied (Background on this error at: http://sqlalche.me/e/4xp6)
Connection info needed in SQLAlchemy format, example:
               postgresql://username:password@hostname/dbname
               or an existing connection: dict_keys([])

That I have solved with:

[root@fedora1 ~]# dnf install libnsl.x86_64

Finally the testing proposed by Dominic Gilles finally worked well…

In below the:

alt.data_transformers.enable('default', max_rows=1000000)

is to bypass limitation of 5000 rows per graph…

And finally Dominic Gilles’ example is working well and I have produced my first Python graphic:

import cx_Oracle
import keyring
import pandas as pd
import altair as alt
import getpass
pwd = getpass.getpass('Please enter your password: ')
%load_ext sql
%sql oracle+cx_oracle://account:$pwd@server1.domain.com:1521/sid
%%sql result <<
select table_name, owner, num_rows, blocks, avg_row_len, trunc(last_analyzed)
from all_tables
where num_rows  > 0
and tablespace_name is not null
AND owner NOT IN ('ANONYMOUS','DBSNMP','WMSYS','XDB','APPQOSSYS','GSMADMIN_INTERNAL','GSMCATUSER','SYSBACKUP','OUTLN',
                  'DIP','SYSDG','ORACLE_OCM','OJVMSYS','SYSKM','XS$NULL','GSMUSER','AUDSYS','SYSTEM','SYS')
result_df = result.DataFrame()
result_df.head()
chart1=alt.Chart(result_df).mark_circle().encode(
    x = alt.X('blocks', scale=alt.Scale(type='log')),
    y = alt.Y('num_rows',scale=alt.Scale(type='log')),
    tooltip=['owner','table_name','blocks','num_rows']
).properties(width=800,height=400).interactive()
chart1.save('/tmp/chart1.html')
chart1
jupyter_lab02
jupyter_lab02
jupyter_lab03
jupyter_lab03

If you want to store the password for a common account use keyring, it might not be more secure as if you share the notebook with others the password can be got by everyone very easily. At first try I have got below error:

(python37_venv) [root@fedora1 ~]# keyring set dwhte yjaquier
Password for 'yjaquier' in 'dwhte':
Traceback (most recent call last):
  File "/root/python37_venv/bin/keyring", line 11, in 
    sys.exit(main())
  File "/root/python37_venv/lib64/python3.7/site-packages/keyring/cli.py", line 111, in main
    return cli.run(argv)
  File "/root/python37_venv/lib64/python3.7/site-packages/keyring/cli.py", line 72, in run
    set_password(service, username, password)
  File "/root/python37_venv/lib64/python3.7/site-packages/keyring/core.py", line 47, in set_password
    _keyring_backend.set_password(service_name, username, password)
  File "/root/python37_venv/lib64/python3.7/site-packages/keyring/backends/fail.py", line 23, in get_password
    raise RuntimeError(msg)
RuntimeError: No recommended backend was available. Install the keyrings.alt package if you want to use the non-recommended backends. See README.rst for details.

Which I have solved by installing keyrings.alt package:

(python37_venv) [root@fedora1 ~]# pip install keyrings.alt
Collecting keyrings.alt
  Downloading https://files.pythonhosted.org/packages/f7/db/202fe99c9f6d75c7810cb3af7d791479df0dd942f2bac2425646c0ad3db8/keyrings.alt-3.1-py2.py3-none-any.whl
Requirement already satisfied: six in ./python37_venv/lib/python3.7/site-packages (from keyrings.alt) (1.11.0)
Installing collected packages: keyrings.alt
Successfully installed keyrings.alt-3.1

I have then been able to store the password for my personal account:

(python37_venv) [root@fedora1 ~]# keyring set dwhte yjaquier
Password for 'yjaquier' in 'dwhte':

References

The post Jupyter Lab installation on Fedora to access an Oracle database appeared first on IT World.

]]>
https://blog.yannickjaquier.com/python/jupyter-lab-installation-fedora-oracle.html/feed 1
Minimum required privileges for PL/SQL debugging with SQL Developer https://blog.yannickjaquier.com/oracle/minimum-privileges-plsql-debugging.html https://blog.yannickjaquier.com/oracle/minimum-privileges-plsql-debugging.html#respond Wed, 16 Jan 2019 16:44:30 +0000 https://blog.yannickjaquier.com/?p=4441 Preamble Our new SOX security rules have struck again and people now using their read only personal account would like to debug procedure in live (!!). Normal and logic answer would be: never ever debug in live ! But as an exercise I wanted to see what is bare minimum grants you need to debug […]

The post Minimum required privileges for PL/SQL debugging with SQL Developer appeared first on IT World.

]]>

Table of contents

Preamble

Our new SOX security rules have struck again and people now using their read only personal account would like to debug procedure in live (!!). Normal and logic answer would be: never ever debug in live ! But as an exercise I wanted to see what is bare minimum grants you need to debug your own procedures, functions or packages as well as what you need to grant to someone else to be able to do the debug for you.

This article is not on how to use SQL Developer debugger as tons of articles have already been written by Jeff Smith and others, see references section.

This blog has been written using a 12cR2 (12.2.0.1.0) Enterprise edition database running on Oracle Linux Server release 7.5 and SQL Developer Version 18.2.0.183.

PL/SQL debugging as object owner

I start by creating a test user with minimum privileges:

SQL> create user test1 identified by test1;

User created.

SQL> grant connect,resource to test1;

Grant succeeded.

And a test procedure with a bit of display and a variable manipulation:

create or replace procedure debug_test
as
  i number:=0;
begin
  dbms_output.put_line('First step, i = ' || i);
  i:=i+1;
  dbms_output.put_line('Second step, i = ' || i);
  i:=i+1;
  dbms_output.put_line('Third step, i = ' || i);
  i:=i+1;
  dbms_output.put_line('Fourth step, i = ' || i);
  i:=i+1;
  dbms_output.put_line('Fifth step, i = ' || i);
  i:=i+1;
end;
/

In SQL Developer, connected with test1 account which is procedure, owner you can display line number and place breakpoint using right click mouse button in left margin or simply press F5 at cursor position to insert a breakpoint:

plsql_debugging01
plsql_debugging01

Once done you can press the beetle button to launch the procedure (in my case) in debug mode:

plsql_debugging02
plsql_debugging02

I got this first error message:

Connecting to the database server1.domain.com_pdb1_test1.
Executing PL/SQL: CALL DBMS_DEBUG_JDWP.CONNECT_TCP( '192.168.56.1', '58696' )
ORA-01031: insufficient privileges
ORA-06512: at "SYS.DBMS_DEBUG_JDWP", line 68
ORA-06512: at line 1
This session requires DEBUG CONNECT SESSION and DEBUG ANY PROCEDURE user privileges.
Process exited.
Disconnecting from the database server1.domain.com_pdb1_test1.

I am a bit surprised by the display stating that I have to grant DEBUG ANY PROCEDURE high privileges as I am the procedure owner… By the way this is also the shorcut used in many articles but I would like to avoid it as by principle all the ANY privileges are quite high…

As found in official documentation:

System Privilege Name Operations Authorized
DEBUG CONNECT SESSION Connect the current session to a debugger.
DEBUG ANY PROCEDURE Debug all PL/SQL and Java code in any database object. Display information on all SQL statements executed by the application.

Note: Granting this privilege is equivalent to granting the DEBUG object privilege on all applicable objects in the database.

DEBUG Access, through a debugger, all public and nonpublic variables, methods, and types defined on the object type.

Place a breakpoint or stop at a line or instruction boundary within the procedure, function, or package. This privilege grants access to the declarations in the method or package specification and body.

To solve execution issue on DBMS_DEBUG_JDWP.CONNECT_TCP you have the solution on Ask TOM web site or in My Oracle Support.

Initial situation is no ACL granted for no one:

SQL> set lines 200 pages 1000
SQL> col host for a10
SQL> col acl for a50
SQL> col acl_owner for a10
SQL> select * from dba_network_acls;

HOST       LOWER_PORT UPPER_PORT ACL                                                ACLID            ACL_OWNER
---------- ---------- ---------- -------------------------------------------------- ---------------- ----------
*                                NETWORK_ACL_4700D2108291557EE05387E5E50A8899       0000000080002724 SYS

SQL> col privilege for a10
SQL> col start_date for a15
SQL> col end_date for a15
SQL> col principal for a20
SQL> select * from dba_network_acl_privileges;

ACL                                                ACLID            PRINCIPAL            PRIVILEGE  IS_GR INVER START_DATE      END_DATE        ACL_OWNER
-------------------------------------------------- ---------------- -------------------- ---------- ----- ----- --------------- --------------- ----------
NETWORK_ACL_4700D2108291557EE05387E5E50A8899       0000000080002724 GSMADMIN_INTERNAL    resolve    true  false                                 SYS
NETWORK_ACL_4700D2108291557EE05387E5E50A8899       0000000080002724 GGSYS                resolve    true  false                                 SYS

Grant ACL privileges on JDWP to connect to the debugging database session with something like:

SQL> exec dbms_network_acl_admin.append_host_ace(host=>'*', ace=> sys.xs$ace_type(privilege_list=>sys.XS$NAME_LIST('JDWP'), -
> principal_name=>'TEST1', principal_type=>sys.XS_ACL.PTYPE_DB));

PL/SQL procedure successfully completed.

SQL> select * from dba_network_acl_privileges;

ACL                                                ACLID            PRINCIPAL            PRIVILEGE  IS_GR INVER START_DATE      END_DATE        ACL_OWNER
-------------------------------------------------- ---------------- -------------------- ---------- ----- ----- --------------- --------------- ----------
NETWORK_ACL_4700D2108291557EE05387E5E50A8899       0000000080002724 GSMADMIN_INTERNAL    resolve    true  false                                 SYS
NETWORK_ACL_4700D2108291557EE05387E5E50A8899       0000000080002724 GGSYS                resolve    true  false                                 SYS
NETWORK_ACL_4700D2108291557EE05387E5E50A8899       0000000080002724 TEST1                JDWP       true  false                                 SYS

It was still failing for a grant issue, I have simply granted:

SQL> grant debug connect session to test1;

Grant succeeded.

And it worked, well almost as my breakpoint has not been taken into account:

Connecting to the database server1.domain.com_pdb1_test1.
Executing PL/SQL: CALL DBMS_DEBUG_JDWP.CONNECT_TCP( '192.168.56.1', '62965' )
Debugger accepted connection from database on port 62965.
Executing PL/SQL: CALL DBMS_DEBUG_JDWP.DISCONNECT()
First step, i = 0
Second step, i = 1
Third step, i = 2
Fourth step, i = 3
Fifth step, i = 4
Process exited.
Disconnecting from the database server1.domain.com_pdb1_test1.
Debugger disconnected from database.

Simply because you have to compile debug the source code, as TEST1 user:

SQL> alter procedure debug_test compile debug;

Procedure altered.

We can now see that debug mode as been activated for our test procedure:

SQL> col plsql_debug for a10
SQL> col plsql_warnings for a15
SQL> col plscope_settings for a20
SQL> select plsql_debug,plsql_warnings,plscope_settings
     from dba_plsql_object_settings
     where owner='TEST1'
     and name='DEBUG_TEST';

PLSQL_DEBU PLSQL_WARNINGS  PLSCOPE_SETTINGS
---------- --------------- --------------------
TRUE       DISABLE:ALL     IDENTIFIERS:NONE

And when executing again the execution stop at my breakpoint (more can be added) and the nice thing is that you can have the value of each variables (i in my case is equal to 2):

plsql_debugging03
plsql_debugging03

PL/SQL debugging with another account

I create a second test user and the goal will be to do debugging with this account giving it minimum privileges:

SQL> create user test2 identified by test2;

User created.

SQL> grant connect,resource to test2;

Grant succeeded.

The minimum you have to grant is debug on procedure, execute permission and the right to use the debugger:

SQL> grant debug on debug_test to test2;

Grant succeeded.

SQL> grant execute on debug_test to test2;

Grant succeeded.

SQL> grant debug connect session to test2;

Grant succeeded.

It failed for error we have seen above:

SQL> exec dbms_network_acl_admin.append_host_ace(host=>'*', ace=> sys.xs$ace_type(privilege_list=>sys.XS$NAME_LIST('JDWP'), -
> principal_name=>'TEST2', principal_type=>sys.XS_ACL.PTYPE_DB));

PL/SQL procedure successfully completed.

SQL> select * from dba_network_acl_privileges;

ACL                                                ACLID            PRINCIPAL            PRIVILEGE  IS_GR INVER START_DATE      END_DATE        ACL_OWNER
-------------------------------------------------- ---------------- -------------------- ---------- ----- ----- --------------- --------------- ----------
NETWORK_ACL_4700D2108291557EE05387E5E50A8899       0000000080002724 GSMADMIN_INTERNAL    resolve    true  false                                 SYS
NETWORK_ACL_4700D2108291557EE05387E5E50A8899       0000000080002724 GGSYS                resolve    true  false                                 SYS
NETWORK_ACL_4700D2108291557EE05387E5E50A8899       0000000080002724 TEST1                JDWP       true  false                                 SYS
NETWORK_ACL_4700D2108291557EE05387E5E50A8899       0000000080002724 TEST2                JDWP       true  false                                 SYS

And it worked as before, with a user not owning the code… Granting execute on PL/SQL code to a read only user raise anyways some security questions…

References

The post Minimum required privileges for PL/SQL debugging with SQL Developer appeared first on IT World.

]]>
https://blog.yannickjaquier.com/oracle/minimum-privileges-plsql-debugging.html/feed 0
How to identify table fragmentation and remove it ? https://blog.yannickjaquier.com/oracle/table-fragmentation-identification.html https://blog.yannickjaquier.com/oracle/table-fragmentation-identification.html#respond Tue, 18 Dec 2018 13:42:25 +0000 https://blog.yannickjaquier.com/?p=4407 Preamble After a first blog post on when to rebuild or shrink indexes I have naturally decided to write a post on table fragmentation and how to remove it. Maybe I should have started with this one but when we implemented Oracle Disk Manager (ODM) I have got the feedback from applicative team they have […]

The post How to identify table fragmentation and remove it ? appeared first on IT World.

]]>

Table of contents

Preamble

After a first blog post on when to rebuild or shrink indexes I have naturally decided to write a post on table fragmentation and how to remove it. Maybe I should have started with this one but when we implemented Oracle Disk Manager (ODM) I have got the feedback from applicative team they have multiple times experience a good performance improvement when rebuilding indexes.

It is equally important to defragment tables as it reduces the number of physical reads to put table data blocks in memory. It also decreases the number of blocks to handle in each query (logical reads) as you will also reduce what is called High Water Mark. You increase rows density per blocks as more table figures (rows) will be clubbed in one block.

I have tried to draw few pictures (tolerance requested, done by myself) to visually explain the concept, you will see them in many other blog posts but those two pictures perfectly sum up what we want to achieve.

As a remainder a tablespace is made of multiple datafiles, the object logical storage is called a segment that is made of multiple extents and each extents is made of multiple blocks. In worst case situation you have deleted multiple rows from your table and remaining rows got sparse is a lot of blocks with a low percentage of completion. High Water Mark (HWM) is the last block containing a row of your table. When doing a Full Table Scan (FTS) all blocks till this HWM will be read. If your table blocks are almost empty you easily understand the over-I/O you will do:

table fragmentation 01
table fragmentation 01

The ideal target is to move to a situation where all rows have been condensed (defragmented) in a minimum number of blocks, each being almost full:

table fragmentation 02
table fragmentation 02

Of course if the deleted rows will be soon reinserted then no need to do anything as Oracle will start to insert new rows in not already full blocks before starting to allocate new ones.

This blog post as been written using Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 – 64bit Production running on Red Hat Enterprise Linux Server release 6.5 (Santiago).

Legacy situation

The old school approach is to work with DBA_TABLES and estimate how much space the table is taking and how much optimal space it could take. Current size is number of used blocks multiply by block size to have a size in bytes. Theoretical smaller size is number of rows multiply by average row length. The gain you might get is a simple computation from this two values. Of course you must take into account the value of PCTFREE that is percentage of space reserved in blocks for future updates (mainly to avoid what is called row chaining when a row is spread on more than one block).

Of course if you want to estimate the size of a non yet existing table it does not apply !

The query could looks like:

select
  a.blocks*b.block_size AS current_size,
  a.num_rows*a.avg_row_len AS theorical_size,
  (a.blocks*b.block_size)-(a.num_rows*a.avg_row_len) AS gain,
  (((a.blocks*b.block_size)-(a.num_rows*a.avg_row_len))*100/(a.blocks*b.block_size)) - a.pct_free AS percentage_gain
from dba_tables a, dba_tablespaces b
where a.tablespace_name=b.tablespace_name
and owner = upper('')
and table_name = upper('');

If you want a better display it can even be put in a PL/SQL block like (inspect_table.sql):

set linesize 200 pages 1000
set serveroutput on size 999999
set verify off
set feedback off
declare
  vcurrent_size number;
  vtheorical_size number;
  vgain number;
  vpercentage_gain number;
  function format_size(value1 in number)
  return varchar2 as
  begin
    case
      when (value1>1024*1024*1024) then return ltrim(to_char(value1/(1024*1024*1024),'999,999.999') || 'GB');
      when (value1>1024*1024) then return ltrim(to_char(value1/(1024*1024),'999,999.999') || 'MB');
      when (value1>1024) then return ltrim(to_char(value1/(1024),'999,999.999') || 'KB');
      else return ltrim(to_char(value1,'999,999.999') || 'B');
    end case;
  end format_size;
begin
  select
    a.blocks*b.block_size,
    a.num_rows*a.avg_row_len,
    (a.blocks*b.block_size)-(a.num_rows*a.avg_row_len),
    (((a.blocks*b.block_size)-(a.num_rows*a.avg_row_len))*100/(a.blocks*b.block_size)) - a.pct_free
  into vcurrent_size, vtheorical_size, vgain, vpercentage_gain
  from dba_tables a, dba_tablespaces b
  where a.tablespace_name=b.tablespace_name
  and owner = upper('&1.')
  and table_name = upper('&2.');

  dbms_output.put_line('For table ' || upper('&1.') || '.' || upper('&2.'));
  dbms_output.put_line('Current table size: ' || format_size(vcurrent_size));
  dbms_output.put_line('Theoretical table size: ' || format_size(vtheorical_size));
  dbms_output.put_line('Potential saving: ' || format_size(vgain));
  dbms_output.put_line('Potential saving percentage: ' || round(vpercentage_gain, 2) || '%');
end;
/
set feedback on

For one of my test tables it gives:

SQL> @inspect_table  
For table owner.table_name
Current table size: 57.031MB
Theoretical table size: 179.000B
Potential saving: 57.031MB
Potential saving percentage: 90%

So my table is currently using around 57MB and I could ideally make it fitting in 179 bytes so one block at the end (that’s why here the computation is not accurate). But here we do not take into account extent management of tablespaces and so obviously the gain will not be that big !

Newest methods to estimate tables size

Not like for indexes here you cannot use EXPLAIN PLAN for create table statement because mainly Oracle cannot guess how many rows you will insert. Since Oracle 10gR1, same as for indexes, you can now DBMS_SPACE.CREATE_TABLE_COST procedures (because two version exist). When calling them you specify target number of row for future tables or real number of rows for existing tables.

As just written there are two versions of DBMS_SPACE.CREATE_TABLE_COST. One where you specify average row length (so for existing table) and one where you give the data type and length of every columns (so apply for soon to be created tables). I have tried both on an existing table and it provided same result, the second form of the procedure is a bit ore complex to handle as you must build a variable with a special type (CREATE_TABLE_COST_COLUMNS) which describe all columns. Here is the small PL/SQL block I have written (create_table_cost.sql):

set linesize 200 pages 1000
set serveroutput on size 999999
set verify off
set feedback off
declare
  vtablespace_name dba_tables.tablespace_name%type;
  vavg_row_len dba_tables.avg_row_len%type;
  vnum_rows dba_tables.num_rows%type;
  vpct_free dba_tables.pct_free%type;
  used_bytes number;
  alloc_bytes number;
  cursor cursor1 is
  select data_type, data_length
  from dba_tab_columns
  where owner = upper('&1.')
  and table_name = upper('&2.')
  order by column_id;
  columns1 sys.create_table_cost_columns:=sys.create_table_cost_columns();
  i number:=0;
  type collection1 is table of cursor1%rowtype index by pls_integer;
  item1 collection1;
  function format_size(value1 in number)
  return varchar2 as
  begin
    case
      when (value1>1024*1024*1024) then return ltrim(to_char(value1/(1024*1024*1024),'999,999.999') || 'GB');
      when (value1>1024*1024) then return ltrim(to_char(value1/(1024*1024),'999,999.999') || 'MB');
      when (value1>1024) then return ltrim(to_char(value1/(1024),'999,999.999') || 'KB');
      else return ltrim(to_char(value1,'999,999.999') || 'B');
    end case;
  end format_size;
begin
  select tablespace_name, avg_row_len, num_rows, pct_free 
  into vtablespace_name, vavg_row_len, vnum_rows, vpct_free 
  from dba_tables
  where owner = upper('&1.')
  and table_name = upper('&2.');
  
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_output.put_line('------------ DBMS_SPACE.CREATE_TABLE_COST version 1 ------------');
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_space.create_table_cost(vtablespace_name, vavg_row_len, vnum_rows, vpct_free, used_bytes, alloc_bytes);
  dbms_output.put_line('Used: ' || format_size(used_bytes));
  dbms_output.put_line('Allocated: ' || format_size(alloc_bytes));

  open cursor1;
  fetch cursor1 bulk collect into item1;
  for i in item1.first..item1.last loop
    columns1.extend;
    columns1(i):=sys.create_table_cost_colinfo(item1(i).data_type, item1(i).data_length);
  end loop;
  close cursor1;
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_output.put_line('------------ DBMS_SPACE.CREATE_TABLE_COST version 2 ------------');
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_space.create_table_cost(vtablespace_name, columns1, vnum_rows, vpct_free, used_bytes, alloc_bytes);
  dbms_output.put_line('Used: ' || format_size(used_bytes));
  dbms_output.put_line('Allocated: ' || format_size(alloc_bytes));
end;
/
set feedback on

With my test table it gives:

SQL> @create_table_cost  
----------------------------------------------------------------
------------ DBMS_SPACE.CREATE_TABLE_COST version 1 ------------
----------------------------------------------------------------
Used: 8.000KB
Allocated: 64.000KB
----------------------------------------------------------------
------------ DBMS_SPACE.CREATE_TABLE_COST version 2 ------------
----------------------------------------------------------------
Used: 8.000KB
Allocated: 64.000KB

So the procedure handle minimum first extent size of 64KB of my 8KB block size and EXTENT MANAGEMENT LOCAL AUTOALLOCATE and SEGMENT SPACE MANAGEMENT AUTO. Despite what Oracle is claiming I see no difference between the two versions of the procedure. If we check current used space (DBA_EXTENTS):

SQL> select bytes, blocks,count(*)
  2  from dba_extents
  3  where owner = upper('')
  4  and segment_name = upper('')
  5  group by bytes, blocks
  6  order by blocks;

     BYTES     BLOCKS   COUNT(*)
---------- ---------- ----------
     65536          8         16
   1048576        128         57

2 rows selected.

Table fragmentation identification

Once we have the estimated size (whatever the method) of the table we can compare it with its actual size and see how much we might gain. To compute the current size of an existing index (of course) we have multiple methods:

  • DBMS_SPACE.SPACE_USAGE procedure
  • DBA_SEGMENTS view
  • DBA_TABLES view

From my testing DBMS_SPACE.SPACE_USAGE is giving exact same result as the query we have seen on DBA_TABLES with a bit ore insight on block completion. So the small PL/SQL blocks I have written is not using DBA_TABLES (table_saving.sql):

set linesize 200 pages 1000
set serveroutput on size 999999
set verify off
set feedback off
declare
  unformatted_blocks number;
  unformatted_bytes number;
  fs1_blocks number;
  fs1_bytes number;
  fs2_blocks number;
  fs2_bytes number;
  fs3_blocks number;
  fs3_bytes number;
  fs4_blocks number;
  fs4_bytes number;
  full_blocks number;
  full_bytes number;
  dbms_space_bytes number;
  bytes_dba_segments number;
  vtablespace_name dba_tables.tablespace_name%type;
  vavg_row_len dba_tables.avg_row_len%type;
  vnum_rows dba_tables.num_rows%type;
  vpct_free dba_tables.pct_free%type;
  used_bytes number;
  alloc_bytes number;
  function format_size(value1 in number)
  return varchar2 as
  begin
    case
      when (value1>1024*1024*1024) then return ltrim(to_char(value1/(1024*1024*1024),'999,999.999') || 'GB');
      when (value1>1024*1024) then return ltrim(to_char(value1/(1024*1024),'999,999.999') || 'MB');
      when (value1>1024) then return ltrim(to_char(value1/(1024),'999,999.999') || 'KB');
      else return ltrim(to_char(value1,'999,999.999') || 'B');
    end case;
  end format_size;
begin
  select tablespace_name, avg_row_len, num_rows, pct_free 
  into vtablespace_name, vavg_row_len, vnum_rows, vpct_free 
  from dba_tables
  where owner = upper('&1.')
  and table_name = upper('&2.');
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_output.put_line('Analyzing table &1..&2.');
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_output.put_line('-------------------- DBMS_SPACE.SPACE_USAGE --------------------');
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_space.space_usage(upper('&1.'), upper('&2.'), 'TABLE', unformatted_blocks, unformatted_bytes, fs1_blocks, fs1_bytes, fs2_blocks,
  fs2_bytes, fs3_blocks, fs3_bytes, fs4_blocks, fs4_bytes, full_blocks, full_bytes);
  dbms_output.put_line('Total number of blocks unformatted :' || unformatted_blocks);
  --dbms_output.put_line('Total number of bytes unformatted: ' || unformatted_bytes);
  dbms_output.put_line('Number of blocks having at least 0 to 25% free space: ' || fs1_blocks);
  --dbms_output.put_line('Number of bytes having at least 0 to 25% free space: ' || fs1_bytes);
  dbms_output.put_line('Number of blocks having at least 25 to 50% free space: ' || fs2_blocks);
  --dbms_output.put_line('Number of bytes having at least 25 to 50% free space: ' || fs2_bytes);
  dbms_output.put_line('Number of blocks having at least 50 to 75% free space: ' || fs3_blocks);
  --dbms_output.put_line('Number of bytes having at least 50 to 75% free space: ' || fs3_bytes);
  dbms_output.put_line('Number of blocks having at least 75 to 100% free space: ' || fs4_blocks);
  --dbms_output.put_line('Number of bytes having at least 75 to 100% free space: ' || fs4_bytes);
  dbms_output.put_line('The number of blocks full in the segment: ' || full_blocks);
  --dbms_output.put_line('Total number of bytes full in the segment: ' || format_size(full_bytes));
  dbms_space_bytes:=unformatted_bytes+fs1_bytes+fs2_bytes+fs3_bytes+fs4_bytes+full_bytes;
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_output.put_line('------------------------- DBA_SEGMENTS -------------------------');
  dbms_output.put_line('----------------------------------------------------------------');
  select bytes into bytes_dba_segments from dba_segments where owner=upper('&1.') and segment_name=upper('&2.');
  dbms_output.put_line('Size of the segment: ' || format_size(bytes_dba_segments));
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_output.put_line('----------------- DBMS_SPACE.CREATE_TABLE_COST -----------------');
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_space.create_table_cost(vtablespace_name, vavg_row_len, vnum_rows, vpct_free, used_bytes, alloc_bytes);
  dbms_output.put_line('Used: ' || format_size(used_bytes));
  dbms_output.put_line('Allocated: ' || format_size(alloc_bytes));
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_output.put_line('---------------------------- Results ---------------------------'); 
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_output.put_line('Potential percentage gain (DBMS_SPACE): ' || round(100 * (dbms_space_bytes - alloc_bytes) / dbms_space_bytes) || '%');
  dbms_output.put_line('Potential percentage gain (DBA_SEGMENTS): ' || round(100 * (bytes_dba_segments - alloc_bytes) / bytes_dba_segments) || '%');
end;
/
set feedback on

On my test table it gives:

SQL> @table_saving  
----------------------------------------------------------------
Analyzing table .
----------------------------------------------------------------
-------------------- DBMS_SPACE.SPACE_USAGE --------------------
----------------------------------------------------------------
Total number of blocks unformatted :0
Number of blocks having at least 0 to 25% free space: 0
Number of blocks having at least 25 to 50% free space: 0
Number of blocks having at least 50 to 75% free space: 0
Number of blocks having at least 75 to 100% free space: 7300
The number of blocks full in the segment: 0
----------------------------------------------------------------
------------------------- DBA_SEGMENTS -------------------------
----------------------------------------------------------------
Size of the segment: 58.000MB
----------------------------------------------------------------
----------------- DBMS_SPACE.CREATE_TABLE_COST -----------------
----------------------------------------------------------------
Used: 8.000KB
Allocated: 64.000KB
----------------------------------------------------------------
---------------------------- Results ---------------------------
----------------------------------------------------------------
Potential percentage gain (DBMS_SPACE): 100%
Potential percentage gain (DBA_SEGMENTS): 100%

Only with DBMS_SPACE.SPACE_USAGE you already know that the potential for storage saving is huge because my table is made of 7300 blocks which are all not more than 25% full…

You can even create a procedure based on above PL/SQL block, I have chosen to use DBMS_SPACE.SPACE_USAGE (table_saving_function.sql):

create or replace function table_saving_function(vtable_owner in varchar2, vtable_name in varchar2)
return number
authid current_user
as
  vtablespace_name dba_tables.tablespace_name%type;
  vavg_row_len dba_tables.avg_row_len%type;
  vnum_rows dba_tables.num_rows%type;
  vpct_free dba_tables.pct_free%type;
  unformatted_blocks number;
  unformatted_bytes number;
  fs1_blocks number;
  fs1_bytes number;
  fs2_blocks number;
  fs2_bytes number;
  fs3_blocks number;
  fs3_bytes number;
  fs4_blocks number;
  fs4_bytes number;
  full_blocks number;
  full_bytes number;
  dbms_space_bytes number;
  used_bytes number;
  alloc_bytes number;
begin
  select tablespace_name, avg_row_len, num_rows, pct_free 
  into vtablespace_name, vavg_row_len, vnum_rows, vpct_free 
  from dba_tables
  where owner = upper(vtable_owner)
  and table_name = upper(vtable_name);
  dbms_space.space_usage(upper(vtable_owner), upper(vtable_name), 'TABLE', unformatted_blocks, unformatted_bytes, fs1_blocks, fs1_bytes, fs2_blocks,
  fs2_bytes, fs3_blocks, fs3_bytes, fs4_blocks, fs4_bytes, full_blocks, full_bytes);
  dbms_space_bytes:=unformatted_bytes+fs1_bytes+fs2_bytes+fs3_bytes+fs4_bytes+full_bytes;
  if (vavg_row_len > 0 and vnum_rows > 0) then
    dbms_space.create_table_cost(vtablespace_name, vavg_row_len, vnum_rows, vpct_free, used_bytes, alloc_bytes);
    if (dbms_space_bytes <> 0) then
      return (100 * (dbms_space_bytes - alloc_bytes) / dbms_space_bytes);
    else
      return 0;
    end if;
  else
    return 0;
  end if;
end;
/

Then with a query like this you can find the best candidates to work on (this is by the way how I have found the example of this blog post):

select a.owner,a.table_name,table_saving_function(a.owner,a.table_name) as percentage_gain
from dba_tables a
where a.owner=''
and a.status='VALID' --In valid state
and a.iot_type is null -- IOT tables not supported by dbms_space
--and external='no' starting from 12cr2
and not exists (select 'x' from dba_external_tables b where b.owner=a.owner and b.table_name=a.table_name)
and temporary='N' --Temporary segment not supported
and a.last_analyzed is not null --Recently analyzed
order by 3 desc;

Move, shrink or export/import ?

We have three options in our hands to defragment tables:

  1. Alter table move (to another tablespace, or same tablespace) and rebuild indexes. You obviously need extra space in tablespace to use it. Using ONLINE keyword in Enterprise edition you have no lock and DML are still possible.
  2. Export and import the table. Needless to say the downtime is big and is difficult to get on a production database. Not the option I would choose…
  3. Shrink command available starting with Oracle 10gR1. Usable on segments in tablespaces with automatic segment management and when row movement has been activated.

So the method to target is ALTER TABLE . SHRINK SPACE [COMPACT] [CASCADE]. SHRINK SPACE COMPACT is equivalent to specifying ALTER [INDEX | TABLE ] … COALESCE.

Same as for indexes COMPACT option has poor interest:

If you specify COMPACT, then Oracle Database only defragments the segment space and compacts the table rows for subsequent release. The database does not readjust the high water mark and does not release the space immediately. You must issue another ALTER TABLE … SHRINK SPACE statement later to complete the operation. This clause is useful if you want to accomplish the shrink operation in two shorter steps rather than one longer step.

Let’s try with my test table:

SQL> alter table . shrink space;

Error starting at line : 1 in command -
alter table . shrink space
Error report -
ORA-10636: ROW MOVEMENT is not enabled

SQL> alter table . enable row movement;

Table . altered.

SQL> alter table . shrink space;

Table . altered.

SQL> exec dbms_stats.gather_table_stats('','');

PL/SQL procedure successfully completed.

SQL> @table_saving  
----------------------------------------------------------------
Analyzing table .
----------------------------------------------------------------
-------------------- DBMS_SPACE.SPACE_USAGE --------------------
----------------------------------------------------------------
Total number of blocks unformatted :0
Number of blocks having at least 0 to 25% free space: 0
Number of blocks having at least 25 to 50% free space: 0
Number of blocks having at least 50 to 75% free space: 0
Number of blocks having at least 75 to 100% free space: 1
The number of blocks full in the segment: 0
----------------------------------------------------------------
------------------------- DBA_SEGMENTS -------------------------
----------------------------------------------------------------
Size of the segment: 64.000KB
----------------------------------------------------------------
----------------- DBMS_SPACE.CREATE_TABLE_COST -----------------
----------------------------------------------------------------
Used: 8.000KB
Allocated: 64.000KB
----------------------------------------------------------------
---------------------------- Results ---------------------------
----------------------------------------------------------------
Potential percentage gain (DBMS_SPACE): -700%
Potential percentage gain (DBA_SEGMENTS): 0%

SQL> @create_table_cost  
----------------------------------------------------------------
------------ DBMS_SPACE.CREATE_TABLE_COST version 1 ------------
----------------------------------------------------------------
Used: 8.000KB
Allocated: 64.000KB
----------------------------------------------------------------
------------ DBMS_SPACE.CREATE_TABLE_COST version 2 ------------
----------------------------------------------------------------
Used: 16.000KB
Allocated: 64.000KB

SQL> @inspect_table  
For table .
Current table size: 8.000KB
Theoretical table size: 1.090KB
Potential saving: 6.910KB
Potential saving percentage: 76.38%

SQL> select bytes, blocks,count(*)
  2  from dba_extents
  3  where owner = upper('')
  4  and segment_name = upper('')
  5  group by bytes, blocks
  6  order by blocks;

     BYTES     BLOCKS   COUNT(*)
---------- ---------- ----------
     65536          8          1

1 row selected.

As expected the table is now fitting in one block. But still one extent of 64KB has been allocated to store it. HWM has been reduced so no real need to find something better. Maybe my PL/SQL should be modified to avoid reporting negative percentage gain and just report 0 for no gain…

References

The post How to identify table fragmentation and remove it ? appeared first on IT World.

]]>
https://blog.yannickjaquier.com/oracle/table-fragmentation-identification.html/feed 0
How to non intrusively find index rebuild or shrink candidates ? https://blog.yannickjaquier.com/oracle/candidates-index-rebuild-shrink.html https://blog.yannickjaquier.com/oracle/candidates-index-rebuild-shrink.html#comments Fri, 23 Nov 2018 14:25:30 +0000 https://blog.yannickjaquier.com/?p=4384 Preamble How to find index rebuild contenders ? I usually get this question around four times a year ! The always working answer is: when you have deleted many rows from source table. Obviously if you delete many rows from source table the under line index will have its leaf blocks getting empty and so […]

The post How to non intrusively find index rebuild or shrink candidates ? appeared first on IT World.

]]>

Table of contents

Preamble

How to find index rebuild contenders ? I usually get this question around four times a year ! The always working answer is: when you have deleted many rows from source table. Obviously if you delete many rows from source table the under line index will have its leaf blocks getting empty and so will benefit from a rebuild. Well to be honest benefit from a rebuild if you do not insert back those rows in source table or if you insert new rows with a different key (related to the index). Okay but how do we know how much leaf empty blocks have been created and how much space would we gain by rebuilding the index ?

The legacy method is based on SQL command:

analyze index ... validate structure;

Which has the bad idea to set an exclusive lock on base table and so forbid any DML. As this method was quite intrusive it has rarely been used on production databases… Despite this you still have plenty of references still suggesting this method that, in my opinion, you must avoid !

Looking a bit of the subject the newest and non intrusive methods are now based on the Oracle estimation of the index size versus the size it currently has. Some more advanced methods are also now displaying the index distribution that could give you an insight of the quality of the index and if you should consider to rebuild it or not.

Legacy situation

Start by analyzing validate structure the index, again this is intrusive command that is forbidding any DML on source table:

SQL> analyze index . validate structure;

Index . analyzed.

Then you have access to a table called INDEX_STATS. The interesting columns are HEIGHT for index height (number of blocks required to go from the root block to a leaf block), LF_ROWS for the number of leaf rows (values in the index) and DEL_LF_ROWS for the number of deleted leaf rows in the index. The seen everywhere formula is to rebuild index when its height is greater than 3 or percentage of leaf rows deleted is greater than 20%. So here is the query:

SQL> set lines 200
SQL> col name for a30
SQL> select name, height, round(del_lf_rows*100/lf_rows,4) as percentage from index_stats;

NAME                               HEIGHT PERCENTAGE
------------------------------ ---------- ----------
                            4      .0006

But again this is clearly a method to avoid nowadays…

Newest methods to estimate indexes size

Current methods are all coming from the feature of well known EXPLAIN PLAN command for DDL. Explaining the DDL of a create index command will feedback the estimated sire of the index. Let’s apply it to my existing index but you can also use it for an index you have not yet created. Get the DDL of your index using DBMS_METADATA.GET_DDL function:

SQL> set long 1000
SQL> select dbms_metadata.get_ddl('INDEX', '', '') as ddl from dual;
DDL
--------------------------------------------------------------------------------

  CREATE INDEX . ON .
  ("SO_SUB_ITEM__ID", "SO_PENDING_CAUSE__CODE")
  PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS NOLOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE 

Then explain create index statement and display related explain plan:

SQL> set lines 150
SQL> explain plan for
  2  CREATE INDEX .
  3  ON .
(, ) 4 PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS NOLOGGING 5 STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 6 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 7 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) 8 TABLESPACE ; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ Plan hash value: 1096024652 --------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------- | 0 | CREATE INDEX STATEMENT | | 74M| 1419M| 156K (1)| 00:00:07 | | 1 | INDEX BUILD NON UNIQUE| | | | | | | 2 | SORT CREATE INDEX | | 74M| 1419M| | | | 3 | TABLE ACCESS FULL |
| 74M| 1419M| 85097 (1)| 00:00:04 | --------------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ Note ----- - estimated index size: 2617M bytes 14 rows selected.

And what do we see at the end of the explain plan: estimated index size: 2617M bytes. Oracle is telling us the size the index would take on disk !

Since Oracle 10gR1 since has been wrapped in a procedure of DBMS_SPACE, so you got this all in one using DBMS_SPACE.CREATE_INDEX_COST procedure.

I have create below script taking owner and index name as parameters:

set linesize 200 pages 1000
set serveroutput on size 999999
set verify off
set feedback off
declare
  used_bytes number;
  alloc_bytes number;
  function format_size(value1 in number)
  return varchar2 as
  begin
    case
      when (value1>1024*1024*1024) then return ltrim(to_char(value1/(1024*1024*1024),'999,999.999') || 'GB');
      when (value1>1024*1024) then return ltrim(to_char(value1/(1024*1024),'999,999.999') || 'MB');
      when (value1>1024) then return ltrim(to_char(value1/(1024),'999,999.999') || 'KB');
      else return ltrim(to_char(value1,'999,999.999') || 'B');
    end case;
  end format_size;
begin
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_output.put_line('----------------- DBMS_SPACE.CREATE_INDEX_COST -----------------');
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_space.create_index_cost(dbms_metadata.get_ddl('INDEX', upper('&2.'), upper('&1.')), used_bytes, alloc_bytes);
  dbms_output.put_line('Used: ' || format_size(used_bytes));
  dbms_output.put_line('Allocated: ' || format_size(alloc_bytes));
end;
/
set feedback on

It gives:

SQL> @create_index_cost  
----------------------------------------------------------------
----------------- DBMS_SPACE.CREATE_INDEX_COST -----------------
----------------------------------------------------------------
Used: 1.386GB
Allocated: 2.438GB

From official documentation:

  • used_bytes: The number of bytes representing the actual index data
  • alloc_bytes: Size of the index when created in the tablespace

So as we can see not an exactly byte to byte equivalence, 2617MB for EXPLAIN PLAN command and 2.438GB (2496MB) for DBMS_SPACE.CREATE_INDEX_COST procedure. But the procedure is far simpler to use !

In My Oracle Support (MOS) note Script to investigate a b-tree index structure (Doc ID 989186.1) Oracle claim they use undocumented SYS_OD_LBID instead of ANALYZE INDEX … VALIDATE STRUCTURE command. But looking deeper into their script, SYS_OP_LBID usage is for something completely and in fact they do not use it to list indexes that might benefit from rebuild. We will see SYS_OP_LBID function in a later chapter of this blog post.

Taking only the size estimate part of MOS note 989186.1 and modifying it to take only two parameters which would be index owner and index name it could become something like (inspect_index.sql):

set linesize 200 pages 1000
set serveroutput on size 999999
set verify off
set feedback off
declare
  vtargetuse   CONSTANT POSITIVE := 90;  -- equates to pctfree 10  
  vleafestimate number;  
  vblocksize    number;
  voverhead     number := 192; -- leaf block "lost" space in index_stats 
  vtable_owner dba_indexes.table_owner%type;
  vtable_name dba_indexes.table_owner%type;
  vleaf_blocks dba_indexes.table_owner%type;
  function format_size(value1 in number)
  return varchar2 as
  begin
    case
      when (value1>1024*1024*1024) then return ltrim(to_char(value1/(1024*1024*1024),'999,999.999') || 'GB');
      when (value1>1024*1024) then return ltrim(to_char(value1/(1024*1024),'999,999.999') || 'MB');
      when (value1>1024) then return ltrim(to_char(value1/(1024),'999,999.999') || 'KB');
      else return ltrim(to_char(value1,'999,999.999') || 'B');
    end case;
  end format_size;
begin
  select table_owner, table_name, leaf_blocks
  into vtable_owner, vtable_name, vleaf_blocks
  from dba_indexes
  where owner = upper('&1.')
  and index_name = upper('&2.');

  select a.block_size
  into vblocksize
  from dba_tablespaces a, dba_indexes b
  where b.index_name = upper('&2.')
  and b.owner = upper('&1.')
  and a.tablespace_name = b.tablespace_name;

  select round(100 / vtargetuse * -- assumed packing efficiency
               (ind.num_rows * (tab.rowid_length + ind.uniq_ind + 4) + sum((tc.avg_col_len) * (tab.num_rows) ))  -- column data bytes  
               / (vblocksize - voverhead)) index_leaf_estimate  
  into vleafestimate  
  from (select  /*+ no_merge */ table_name, num_rows, decode(partitioned,'YES',10,6) rowid_length  
       from dba_tables
       where table_name  = vtable_name  
         and owner       = vtable_owner) tab,  
      (select  /*+ no_merge */ index_name, index_type, num_rows, decode(uniqueness,'UNIQUE',0,1) uniq_ind  
       from dba_indexes  
       where table_owner = vtable_owner  
       and table_name  = vtable_name  
       and owner = upper('&1.')  
       and index_name  = upper('&2.')) ind,  
      (select  /*+ no_merge */ column_name  
       from dba_ind_columns  
       where table_owner = vtable_owner  
       and table_name  = vtable_name 
       and index_owner = upper('&1.')   
       and index_name  = upper('&2.')) ic,  
      (select  /*+ no_merge */ column_name, avg_col_len  
       from dba_tab_cols  
       where owner = vtable_owner  
       and table_name  = vtable_name) tc  
  where tc.column_name = ic.column_name  
  group by ind.num_rows, ind.uniq_ind, tab.rowid_length; 

  dbms_output.put_line('For index ' || upper('&1.') || '.' || upper('&2.') || ', source table is ' || vtable_owner || '.' || vtable_name);
  dbms_output.put_line('Current leaf blocks: ' || vleaf_blocks);
  dbms_output.put_line('Current size: ' || format_size(vleaf_blocks * vblocksize));
  dbms_output.put_line('Estimated leaf blocks: ' || round(vleafestimate,2));
  dbms_output.put_line('Estimated size: ' || format_size(vleafestimate * vblocksize));
end;
/
set feedback on

On my test index it gives:

SQL> @inspect_index  
For index ., source table is .
Current leaf blocks: 375382 Current size: 2.864GB Estimated leaf blocks: 335395 Estimated size: 2.559GB

Which is a third estimation of the size the index would take on disk… But not more explanation of the formula is given by Oracle so difficult to take it as is…

Index rebuild candidates list

Once we have the estimated size (whatever the method) of the index we can compare it with its actual size and see how much we might gain. To compute the current size of an existing index (of course) we have two methods:

  • DBMS_SPACE.SPACE_USAGE procedure
  • DBA_SEGMENTS view

Of course using DBA_SEGMENTS you are not only taking the real blocks used under the High Water Mark (HWM) but as you can see below it does not make a huge difference for my test index. The script I have written is taking index owner and index name as parameters (index_saving.sql):

set linesize 200 pages 1000
set serveroutput on size 999999
set verify off
set feedback off
declare
  unformatted_blocks number;
  unformatted_bytes number;
  fs1_blocks number;
  fs1_bytes number;
  fs2_blocks number;
  fs2_bytes number;
  fs3_blocks number;
  fs3_bytes number;
  fs4_blocks number;
  fs4_bytes number;
  full_blocks number;
  full_bytes number;
  dbms_space_bytes number;
  bytes_dba_segments number;
  used_bytes number;
  alloc_bytes number;
  function format_size(value1 in number)
  return varchar2 as
  begin
    case
      when (value1>1024*1024*1024) then return ltrim(to_char(value1/(1024*1024*1024),'999,999.999') || 'GB');
      when (value1>1024*1024) then return ltrim(to_char(value1/(1024*1024),'999,999.999') || 'MB');
      when (value1>1024) then return ltrim(to_char(value1/(1024),'999,999.999') || 'KB');
      else return ltrim(to_char(value1,'999,999.999') || 'B');
    end case;
  end format_size;
begin
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_output.put_line('Analyzing index &1..&2.');
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_output.put_line('-------------------- DBMS_SPACE.SPACE_USAGE --------------------');
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_space.space_usage(upper('&1.'), upper('&2.'), 'INDEX', unformatted_blocks, unformatted_bytes, fs1_blocks, fs1_bytes, fs2_blocks,
  fs2_bytes, fs3_blocks, fs3_bytes, fs4_blocks, fs4_bytes, full_blocks, full_bytes);
  dbms_output.put_line('Total number of blocks unformatted :' || unformatted_blocks);
  --dbms_output.put_line('Total number of bytes unformatted: ' || unformatted_bytes);
  dbms_output.put_line('Number of blocks having at least 0 to 25% free space: ' || fs1_blocks);
  --dbms_output.put_line('Number of bytes having at least 0 to 25% free space: ' || fs1_bytes);
  dbms_output.put_line('Number of blocks having at least 25 to 50% free space: ' || fs2_blocks);
  --dbms_output.put_line('Number of bytes having at least 25 to 50% free space: ' || fs2_bytes);
  dbms_output.put_line('Number of blocks having at least 50 to 75% free space: ' || fs3_blocks);
  --dbms_output.put_line('Number of bytes having at least 50 to 75% free space: ' || fs3_bytes);
  dbms_output.put_line('Number of blocks having at least 75 to 100% free space: ' || fs4_blocks);
  --dbms_output.put_line('Number of bytes having at least 75 to 100% free space: ' || fs4_bytes);
  dbms_output.put_line('The number of blocks full in the segment: ' || full_blocks);
  --dbms_output.put_line('Total number of bytes full in the segment: ' || format_size(full_bytes));
  dbms_space_bytes:=unformatted_bytes+fs1_bytes+fs2_bytes+fs3_bytes+fs4_bytes+full_bytes;
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_output.put_line('------------------------- DBA_SEGMENTS -------------------------');
  dbms_output.put_line('----------------------------------------------------------------');
  select bytes into bytes_dba_segments from dba_segments where owner=upper('&1.') and segment_name=upper('&2.');
  dbms_output.put_line('Size of the segment: ' || format_size(bytes_dba_segments));
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_output.put_line('----------------- DBMS_SPACE.CREATE_INDEX_COST -----------------');
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_space.create_index_cost(dbms_metadata.get_ddl('INDEX', upper('&2.'), upper('&1.')), used_bytes, alloc_bytes);
  dbms_output.put_line('Used: ' || format_size(used_bytes));
  dbms_output.put_line('Allocated: ' || format_size(alloc_bytes));
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_output.put_line('---------------------------- Results ---------------------------'); 
  dbms_output.put_line('----------------------------------------------------------------');
  dbms_output.put_line('Potential percentage gain (DBMS_SPACE): ' || round(100 * (dbms_space_bytes - alloc_bytes) / dbms_space_bytes) || '%');
  dbms_output.put_line('Potential percentage gain (DBA_SEGMENTS): ' || round(100 * (bytes_dba_segments - alloc_bytes) / bytes_dba_segments) || '%');
end;
/
set feedback on

It gives for me:

SQL> @index_saving  
----------------------------------------------------------------
Analyzing index .
----------------------------------------------------------------
-------------------- DBMS_SPACE.SPACE_USAGE --------------------
----------------------------------------------------------------
Total number of blocks unformatted :1022
Number of blocks having at least 0 to 25% free space: 0
Number of blocks having at least 25 to 50% free space: 35
Number of blocks having at least 50 to 75% free space: 0
Number of blocks having at least 75 to 100% free space: 0
The number of blocks full in the segment: 365448
----------------------------------------------------------------
------------------------- DBA_SEGMENTS -------------------------
----------------------------------------------------------------
Size of the segment: 2.803GB
----------------------------------------------------------------
----------------- DBMS_SPACE.CREATE_INDEX_COST -----------------
----------------------------------------------------------------
Used: 1.386GB
Allocated: 2.438GB
----------------------------------------------------------------
---------------------------- Results ---------------------------
----------------------------------------------------------------
Potential percentage gain (DBMS_SPACE): 13%
Potential percentage gain (DBA_SEGMENTS): 13%           

Let say we chose the DBMS_SPACE method, I have then tried to club this in a function to be able to analyze multiple indexes of a schema at same time. To handle security problem I have granted to my DBA account:

SQL> grant execute on dbms_space to yjaquier;

Grant succeeded.

SQL> grant execute on dbms_metadata to yjaquier;

Grant succeeded.

SQL>  grant analyze any to yjaquier;

Grant succeeded.

And DBMS_METADATA as they say in official documentation:

If you want to write a PL/SQL program that fetches metadata for objects in a different schema (based on the invoker’s possession of SELECT_CATALOG_ROLE), you must make the program invokers-rights.

So used the AUTHID CURRENT_USER as invoker’s rights clause:

create or replace function index_saving_function(index_owner in varchar2, index_name varchar2)
return number
authid current_user
as
  unformatted_blocks number;
  unformatted_bytes number;
  fs1_blocks number;
  fs1_bytes number;
  fs2_blocks number;
  fs2_bytes number;
  fs3_blocks number;
  fs3_bytes number;
  fs4_blocks number;
  fs4_bytes number;
  full_blocks number;
  full_bytes number;
  dbms_space_bytes number;
  used_bytes number;
  alloc_bytes number;
begin
  dbms_space.space_usage(upper(index_owner), upper(index_name), 'INDEX', unformatted_blocks, unformatted_bytes, fs1_blocks, fs1_bytes, fs2_blocks,
  fs2_bytes, fs3_blocks, fs3_bytes, fs4_blocks, fs4_bytes, full_blocks, full_bytes);
  dbms_space_bytes:=unformatted_bytes+fs1_bytes+fs2_bytes+fs3_bytes+fs4_bytes+full_bytes;
  dbms_space.create_index_cost(dbms_metadata.get_ddl('INDEX', upper(index_name), upper(index_owner)), used_bytes, alloc_bytes);
  if (dbms_space_bytes <> 0) then
    return (100 * (dbms_space_bytes - alloc_bytes) / dbms_space_bytes);
  else
    return 0;
  end if;
end;
/

Finally a simple query like this gives a good first analysis of what could be potential candidates for shrink/rebuild:

select owner,index_name,index_saving_function(owner,index_name) as percentage_gain
from dba_indexes
where owner=''
and last_analyzed is not null
and partitioned='NO'
order by 3 desc;

To go further

The SYS_OP_LBID internal Oracle function that has been first (I think) shared by Jonathan Lewis and that you can find on plenty of blog post as well as MOS note Script to investigate a b-tree index structure (Doc ID 989186.1) return the leaf block id where is store the source table key that is given by source table rowid parameter of the SYS_OP_LBID function. If you group by the result per leaf block id you get the number of source table keys per leaf block id.

On all the queries shared around the next idea is to order by and group by this number of keys per leaf block and see how much blocks you have to access to get them. The queries using an aggregate function to sum row by row this block required to be read are the best to make an analysis (sys_op_lbid.sql):

set linesize 200 pages 1000
set serveroutput on size 999999
set verify off
set feedback off
declare
  vsql varchar2(1000);
  v_id number;
  vtable_owner dba_indexes.table_owner%type;
  vtable_name dba_indexes.table_owner%type;
  col01 varchar2(50);
  col02 varchar2(50);
  col03 varchar2(50);
  col04 varchar2(50);
  col05 varchar2(50);
  col06 varchar2(50);
  col07 varchar2(50);
  col08 varchar2(50);
  col09 varchar2(50);
  col10 varchar2(50);
  TYPE IdxRec IS RECORD (keys_per_leaf number, blocks number, cumulative_blocks number);
  TYPE IdxTab IS TABLE OF IdxRec;
  l_data IdxTab;
begin
  select object_id
  into v_id
  from dba_objects
  where owner = upper('&1.')
  and object_name = upper('&2.');
  
  select table_owner, table_name
  into vtable_owner, vtable_name
  from dba_indexes
  where owner = upper('&1.')
  and index_name = upper('&2.');
  
  select
    nvl(max(decode(column_position, 1,column_name)),'null'),
    nvl(max(decode(column_position, 2,column_name)),'null'),
    nvl(max(decode(column_position, 3,column_name)),'null'),
    nvl(max(decode(column_position, 4,column_name)),'null'),
    nvl(max(decode(column_position, 5,column_name)),'null'),
    nvl(max(decode(column_position, 6,column_name)),'null'),
    nvl(max(decode(column_position, 7,column_name)),'null'),
    nvl(max(decode(column_position, 8,column_name)),'null'),
    nvl(max(decode(column_position, 9,column_name)),'null'),
    nvl(max(decode(column_position, 10,column_name)),'null')
  into col01, col02, col03, col04, col05, col06, col07, col08, col09, col10
  from dba_ind_columns
  where table_owner = vtable_owner
  and table_name  = vtable_name
  and index_name  = upper('&2.')
  order by column_position;
  
  vsql:='SELECT keys_per_leaf, blocks, SUM(blocks) OVER(ORDER BY keys_per_leaf) cumulative_blocks FROM (SELECT ' ||
        'keys_per_leaf,COUNT(*) blocks FROM (SELECT /*+ ' ||
        'cursor_sharing_exact ' ||
        'dynamic_sampling(0) ' ||
        'no_monitoring ' ||
        'no_expand ' ||
        'index_ffs(' || vtable_name || ',' || '&2.' || ') ' ||
        'noparallel_index(' || vtable_name || ',' || '&2.' || ') */ ' ||
        'sys_op_lbid(' || v_id || ',''L'',t1.rowid) AS block_id,' ||
        'COUNT(*) AS keys_per_leaf ' ||
        'FROM &1..' || vtable_name ||' t1 ' ||
        'WHERE ' || col01 || ' IS NOT NULL ' ||
        'OR ' || col02 || ' IS NOT NULL ' ||
        'OR ' || col03 || ' IS NOT NULL ' ||
        'OR ' || col04 || ' IS NOT NULL ' ||
        'OR ' || col05 || ' IS NOT NULL ' ||
        'OR ' || col06 || ' IS NOT NULL ' ||
        'OR ' || col07 || ' IS NOT NULL ' ||
        'OR ' || col08 || ' IS NOT NULL ' ||
        'OR ' || col09 || ' IS NOT NULL ' ||
        'OR ' || col10 || ' IS NOT NULL ' ||
        'GROUP BY sys_op_lbid('||v_id||',''L'',t1.rowid)) ' ||
        'GROUP BY keys_per_leaf) ' ||
    'ORDER BY keys_per_leaf';
  --dbms_output.put_line(vsql);
  execute immediate vsql bulk collect into l_data;

  dbms_output.put_line('KEYS_PER_LEAF     BLOCKS CUMULATIVE_BLOCKS');
  dbms_output.put_line('------------- ---------- -----------------');
   for i in l_data.first..l_data.last loop
     dbms_output.put_line(lpad(l_data(i).keys_per_leaf,13) || ' ' || lpad(l_data(i).blocks,10) || ' ' || lpad(l_data(i).cumulative_blocks,17));
   end loop;
end;
/
set feedback on

Then a nice trick is to copy and paste the result in Excel and do a chart on this figures. Doing this you will better see any sudden jump in number of blocks required to read key lea blocks. In a well balanced index the progression should be as much linear as possible:

index_rebuild01
index_rebuild01

I initially thought that any sudden jump in the number of block required to be read to get the keys is an indication of a index that would benefit from rebuild. But I was wrong (see below why after the index has been rebuilt) ! In this chart what you have to try to identify if the number of cumulative blocks increasing rapidly while the number of keys read is moving slowly. In my chart it start well has number of key read is increasing while number of blocks is flat. But then after number of blocks is constantly increasing while the number of keys read is moving slowly. Said differently the list should be more condensed. The issue is here…

If we are back to the raw figures we see the jump here below:

KEYS_PER_LEAF     BLOCKS CUMULATIVE_BLOCKS       
------------- ---------- -----------------       
.
.
          117        196               202
          118        289               491
          119        347               838
          120        205              1043
          121        502              1545
          122        690              2235
          123        851              3086
          124       9629             12715
          125      11104             23819
          126       5773             29592
          127       1991             31583
          128       1148             32731
          129        956             33687
          130        982             34669
          131       1036             35705
          132       1946             37651
          133       4435             42086
          134       6254             48340
          135       2265             50605
          136         26             50631
          137         27             50658
          138         30             50688
          139         21             50709
          140         72             50781
          141         57             50838
          142         95             50933
          143        211             51144
          144        483             51627
          145        408             52035
.
.
          228        823            140172
          229        795            140967
          230       1111            142078
          231     215514            357592
          232       3212            360804
.
.

Rebuild or shrink ?

One of the drawback of rebuilding an index is that you need double index space on disk, this is also a little bit longer than coalescing it… If you run an enterprise edition of the Oracle database then the ONLINE keyword keep you safe from DML locking.

For an index or index-organized table, specifying ALTER [INDEX | TABLE] … SHRINK SPACE COMPACT is equivalent to specifying ALTER [INDEX | TABLE ] … COALESCE.

Few people have tried to compare REBUIL and SHRINK and draw some conclusions (see references), but to be honest it looks difficult to give precise rules on what to do. If your index is not too much fragmented SHRINK should give good result, if not then you have to go for REBUILD. It also depends on which overhead you would like to put on your database. I have tried both on my test index:

I have firstly tried using SHRINK SPACE COMPACT with very low result and found in Oracle official documentation:

If you specify COMPACT, then Oracle Database only defragments the segment space and compacts the table rows for subsequent release. The database does not readjust the high water mark and does not release the space immediately. You must issue another ALTER TABLE … SHRINK SPACE statement later to complete the operation. This clause is useful if you want to accomplish the shrink operation in two shorter steps rather than one longer step.

Even if they explain COMPACT option only for tables I have feeling that it behaves almost the same for indexes (I have been able to perform multiple tests as my test database got refreshed from live one that remained unchanged):>/p>

SQL> alter index . shrink space compact;

Index . altered.

SQL> @index_saving  
----------------------------------------------------------------
Analyzing index .
----------------------------------------------------------------
-------------------- DBMS_SPACE.SPACE_USAGE --------------------
----------------------------------------------------------------
Total number of blocks unformatted :1022
Number of blocks having at least 0 to 25% free space: 0
Number of blocks having at least 25 to 50% free space: 3
Number of blocks having at least 50 to 75% free space: 0
Number of blocks having at least 75 to 100% free space: 13865
The number of blocks full in the segment: 351615
----------------------------------------------------------------
------------------------- DBA_SEGMENTS -------------------------
----------------------------------------------------------------
Size of the segment: 2.803GB
----------------------------------------------------------------
----------------- DBMS_SPACE.CREATE_INDEX_COST -----------------
----------------------------------------------------------------
Used: 1.386GB
Allocated: 2.438GB
----------------------------------------------------------------
---------------------------- Results ---------------------------
----------------------------------------------------------------
Potential percentage gain (DBMS_SPACE): 9%
Potential percentage gain (DBA_SEGMENTS): 13%

SQL> @inspect_index  
For index ., source table is .
Current leaf blocks: 375382 Current size: 2.864GB Estimated leaf blocks: 335395 Estimated size: 2.559GB SQL> @sys_op_lbid KEYS_PER_LEAF BLOCKS CUMULATIVE_BLOCKS ------------- ---------- ----------------- . . 119 10 21 120 8 29 121 18 47 122 36 83 123 34 117 124 522 639 125 537 1176 126 253 1429 127 90 1519 . . 230 1238 91459 231 267907 359366 232 1562 360928 . .

Does not provide a very good result the index remained almost unchanged ! Without COMPACT keywords (figures slightly different as index evolve on live database:):

SQL> alter index . shrink space;

Index . altered.

SQL> @index_saving  
----------------------------------------------------------------
Analyzing index .
----------------------------------------------------------------
-------------------- DBMS_SPACE.SPACE_USAGE --------------------
----------------------------------------------------------------
Total number of blocks unformatted :0
Number of blocks having at least 0 to 25% free space: 0
Number of blocks having at least 25 to 50% free space: 3
Number of blocks having at least 50 to 75% free space: 0
Number of blocks having at least 75 to 100% free space: 0
The number of blocks full in the segment: 368842
----------------------------------------------------------------
------------------------- DBA_SEGMENTS -------------------------
----------------------------------------------------------------
Size of the segment: 2.821GB
----------------------------------------------------------------
----------------- DBMS_SPACE.CREATE_INDEX_COST -----------------
----------------------------------------------------------------
Used: 1.526GB
Allocated: 2.688GB
----------------------------------------------------------------
---------------------------- Results ---------------------------
----------------------------------------------------------------
Potential percentage gain (DBMS_SPACE): 4%
Potential percentage gain (DBA_SEGMENTS): 5%

A bit better without COMPACT keyword, we see that blocks have been de-fragmented but not released and index is still not in its optimal form. Which could be satisfactory versus the load you want to put on your database. let’s try rebuilding it which is a bit more resource consuming:

SQL> alter index . rebuild online;

Index . altered.

SQL> @index_saving  
----------------------------------------------------------------
Analyzing index .
----------------------------------------------------------------
-------------------- DBMS_SPACE.SPACE_USAGE --------------------
----------------------------------------------------------------
Total number of blocks unformatted :0
Number of blocks having at least 0 to 25% free space: 0
Number of blocks having at least 25 to 50% free space: 1
Number of blocks having at least 50 to 75% free space: 0
Number of blocks having at least 75 to 100% free space: 0
The number of blocks full in the segment: 325098
----------------------------------------------------------------
------------------------- DBA_SEGMENTS -------------------------
----------------------------------------------------------------
Size of the segment: 2.507GB
----------------------------------------------------------------
----------------- DBMS_SPACE.CREATE_INDEX_COST -----------------
----------------------------------------------------------------
Used: 1.386GB
Allocated: 2.438GB
----------------------------------------------------------------
---------------------------- Results ---------------------------
----------------------------------------------------------------
Potential percentage gain (DBMS_SPACE): 2%
Potential percentage gain (DBA_SEGMENTS): 3%

SQL> @inspect_index  
For index ., source table is .
Current leaf blocks: 323860 Current size: 2.471GB Estimated leaf blocks: 330825 Estimated size: 2.524GB SQL> @sys_op_lbid KEYS_PER_LEAF BLOCKS CUMULATIVE_BLOCKS ------------- ---------- ----------------- 56 1 1 217 10086 10087 218 146 10233 219 120 10353 220 70 10423 221 64 10487 222 65 10552 223 76 10628 224 29037 39665 225 1566 41231 226 1268 42499 227 1077 43576 228 861 44437 229 928 45365 230 1245 46610 231 277246 323856 248 1 323857 341 3 323860

Graphically it gives:

index_rebuild02
index_rebuild02

Much better ! I still have a big jump is number of blocks required to be read when number of keys is increasing but I suppose it comes from a key that has high frequency in my source table. Which demonstrate that it is not abnormal to have such big jump in queries with SYS_OP_LBID internal function.

The process you could apply is to try to shrink your index by default. If you are unhappy with the result try to afford a rebuild. The good thing now is that checking the index is not locking anything and you can launch it multiple time even on a production database…

References

The post How to non intrusively find index rebuild or shrink candidates ? appeared first on IT World.

]]> https://blog.yannickjaquier.com/oracle/candidates-index-rebuild-shrink.html/feed 1