PL/SQL performance tuning with bulk SQL/binding and parallelism

 

Preamble

As usual, I would say, the question is coming from a colleague asking me if I know BULK COLLECT and FORALL keywords in PL/SQL to improve context switch between SQL and PL/SQL. I knew it as I have read article from Steven Feuerstein: Bulk Processing with BULK COLLECT and FORALL.

But I have never taken the time to test it and to write a small post on it. The testing done by my colleague was giving impressive improvement and so I wanted to see it with my own eyes and practice, as well, a bit on it.

You might think (like I have done for multiple years) that those SQL statements executed in your PL/SQL blocks are free of charge and does not impact at all the flow of your program. In reality they are and Oracle switch from PL/SQL engine to SQL engine. This switch is called a context switch and the aim of this post is testing how it can simply improved using BULK COLLECT and FORALL keywords as well as working with collections to process multiple row at same time. Those two keywords are included, by Oracle, in Bulk SQL feature. For deeper explanation refer either to Oracle documentation or Steven’s article…

Testing has been done using a 12cR1 (12.1.0.2) Enterprise edition running on a Oracle Linux Server release 7.3 virtual machine. My virtual machine has 8GB of memory and 4 threads, the database has 2GB of memory (MEMORY_TARGET). No need to say that if you plan to compare results you must not launch heavy task on your server (desktop in my case) in parallel…

Setup

I create my usual test table with with id column as primary key:

DROP TABLE test01;
 
CREATE TABLE test01 (
id NUMBER,
val VARCHAR2(100)
)
NOLOGGING
TABLESPACE users;
 
ALTER TABLE test01
ADD CONSTRAINT test01_pk PRIMARY KEY (id)
USING INDEX TABLESPACE users
ENABLE VALIDATE;
 
--ALTER TABLE TEST01 DROP CONSTRAINT TEST01_PK;

I fill it with 50 millions rows, almost 8GB on disk and maximum I could afford on my limited virtual machine:

DECLARE
i NUMBER;
nbrows NUMBER;
BEGIN
  i:=1;
  nbrows:=50000000;
  LOOP
    EXIT WHEN i>nbrows;
    INSERT INTO test01 VALUES(i,RPAD('A',100,'A'));
    IF (MOD(i,1000000)=0) THEN
      COMMIT;
    END IF;
    i:=i+1;
  END LOOP;
  COMMIT;
END;
/
 
EXEC dbms_stats.gather_table_stats(ownname => 'yjaquier', tabname => 'test01', degree=> 2, -
method_opt => 'for all columns size auto', CASCADE => TRUE);

The (stupid) test will be an update of all table rows to change the value of val column.

Classical method

The classical method is a cursor on table and a loop to fetch all rows and update current one using the primary key. I replace the 100 A by 100 B to be able to follow if update has been done. I also calculate the execution time and flush buffer cache and shared pool before each run not to be impacted by blocks already in memory. I also commit every one million rows not to fill my 8GB undo tablespace:

ALTER SYSTEM flush buffer_cache;
ALTER SYSTEM flush shared_pool;
SET serveroutput ON SIZE 999999
DECLARE
  start_time NUMBER;
  end_time NUMBER;
  CURSOR cursor1 IS SELECT id,val
  FROM test01;
  item1 cursor1%ROWTYPE;
  i NUMBER;
BEGIN
  start_time:=dbms_utility.get_time;
  i:=1;
  OPEN cursor1;
  LOOP
    FETCH cursor1 INTO item1;
    EXIT WHEN cursor1%notfound;
    UPDATE /* Yannick */ test01 SET val=RPAD('B',100,'B') WHERE id=item1.id;
    i:=i+1;
    IF (MOD(i,1000000)=0) THEN
      COMMIT;
    END IF;
  END LOOP;
  CLOSE cursor1;
  COMMIT;
  end_time:=dbms_utility.get_time;
  dbms_output.put_line('Explicit Cursor Loop Update: ' || TO_CHAR((end_time - start_time)/100) || ' seconds');
END;
/

I have got an average of around 2150 seconds over 7 runs.

Bulk SQL method

The one with BULK COLLECT and FORALL. To complement the code I have chosen to:

  • Limit memory usage by limiting number of returned rows to 1,000,000 using LIMIT keyword. I also commit every million rows then.
  • Trap error message using SAVE EXCEPTIONS keyword.

The test:

ALTER SYSTEM flush buffer_cache;
ALTER SYSTEM flush shared_pool;
SET serveroutput ON SIZE 999999
DECLARE
  start_time NUMBER;
  end_time NUMBER;
  i NUMBER;
  CURSOR cursor1 IS SELECT id,val
  FROM test01;
  TYPE collection1 IS TABLE OF cursor1%ROWTYPE INDEX BY PLS_INTEGER;
  item1 collection1;
  dml_errors EXCEPTION;
  PRAGMA exception_init(dml_errors, -24381);
BEGIN
  start_time:=dbms_utility.get_time;
  OPEN cursor1;
  LOOP
    FETCH cursor1 BULK COLLECT INTO item1 limit 1000000;
    EXIT WHEN item1.COUNT=0;
    FORALL i IN item1.FIRST..item1.LAST save exceptions
      UPDATE /* Yannick */ test01 SET val=RPAD('C',100,'C') WHERE id=item1(i).id;
		COMMIT;
  END LOOP;
  CLOSE cursor1;
  COMMIT;
  end_time:=dbms_utility.get_time;
  dbms_output.put_line('FORALL and BULK COLLECT Update: ' || TO_CHAR((end_time - start_time)/100) || ' seconds');
EXCEPTION
  WHEN dml_errors THEN
    IF cursor1%isopen THEN CLOSE cursor1;
    END IF;
    FOR i IN 1..sql%bulk_exceptions.COUNT LOOP
      dbms_output.put_line(SQLERRM(-(sql%bulk_exceptions(i).error_code)));
      dbms_output.put_line(sql%bulk_exceptions(i).error_index);
    END LOOP;
    COMMIT;
  WHEN OTHERS THEN
    dbms_output.put_line('Unrecognized error.');
    RAISE;
END;
/

I have got an average of around 980 seconds over 7 runs.

Overall the improvement is around 2 times !! Moreover even if few slight modifications are needed it is not a complete redesign of your PL/SQL block. Here the example is deliberately simple but this type of loop to update few columns is a classical way of working in BI environment. I rate this bulk SQL/binding features as a rapid and simple quick win to implement and benefit from !

DBMS_PARALLEL_EXECUTE method

In above testing my colleague tried to use DML parallel execution, that is, I have discovered not activated by default. Only parallel DDL and query (ouf) is parallel enabled by default. Obviously in the classical method as the statement is updating one row at a time there is nothing to parallelize. But in the bulk SQL one you might have expected the PL/SQL block to be parallelize. With the help of the /* Yannick */ comment I can fetch V$SQL view (you should flush the shared pool before) and with PX_SERVERS_EXECUTIONS column you can see if your statement has used parallel processes and how many.

SQL> col sql_text FOR a50 word_wrapped
SQL> SET lines 150 pages 1000
SQL> SELECT
     sql_id,
     SUBSTR(sql_text,1,200) AS sql_text,
     px_servers_executions,
     disk_reads,
     buffer_gets,
     plsql_exec_time,
     elapsed_time
     FROM v$sql
     WHERE sql_text LIKE '%/* Yannick */%'
     AND sql_text NOT LIKE 'select%';
 
SQL_ID        SQL_TEXT                                           PX_SERVERS_EXECUTIONS DISK_READS BUFFER_GETS PLSQL_EXEC_TIME ELAPSED_TIME
------------- -------------------------------------------------- --------------------- ---------- ----------- --------------- ------------
32h9ka1576yny DECLARE   start_time NUMBER;   end_time NUMBER;                        0    1662231   252397556       218299035   2196358064
              CURSOR cursor1 IS SELECT id,val   FROM test01;
              item1 cursor1%ROWTYPE;   i NUMBER; BEGIN
              start_time:=dbms_utility.get_time;   i:=1;   OPEN
              cursor1;
 
SQL_ID        SQL_TEXT                                           PX_SERVERS_EXECUTIONS DISK_READS BUFFER_GETS PLSQL_EXEC_TIME ELAPSED_TIME
------------- -------------------------------------------------- --------------------- ---------- ----------- --------------- ------------
9zjc7pp7sp0sa DECLARE   start_time NUMBER;   end_time NUMBER;                        0    1662256   111747162          435065    993450741
              i NUMBER;   CURSOR cursor1 IS SELECT id,val   FROM
              test01;   TYPE collection1 IS TABLE OF
              cursor1%ROWTYPE INDEX BY PLS_INTEGER;   item1
              collection1;

As you can see it is serial, so how could we benefit from modern multicores processor and update the table in parallel and inside a PL/SQL block. For example how to benefit from:

SQL> ALTER SESSION enable parallel dml;
 
SESSION altered.

Well reading around you cannot with a simple UPDATE as this cannot run in parallel on a non-partitioned table. The only option would be to issue multiple jobs in parallel working each on a subset of the table. DBMS_PARALLEL_EXECUTE comes to the rescue to ease this type of implementation. The package principle is to split the table you plan to update in multiple smaller chunks and execute multiple tasks on all chunks in parallel. Each execution will be done through an Oracle job (CREATE JOB privilege needed).

It starts by a task creation:

SQL> EXEC dbms_parallel_execute.create_task('task01');
 
PL/SQL PROCEDURE successfully completed.
 
SQL> SET lines 150 pages 1000
SQL> col task_name FOR a10
SQL> SELECT task_name, chunk_type, status FROM user_parallel_execute_tasks;
 
TASK_NAME  CHUNK_TYPE   STATUS
---------- ------------ -------------------
task01     UNDELARED    CREATED

You have the choice to create chunks by rowid, column value or user defined by SQL. I have chosen to create chunks based on ID column value, I have created chunks of one million rows to keep same spirit as previous tests:

SQL> EXEC dbms_parallel_execute.create_chunks_by_number_col(task_name => 'task01', table_owner => 'YJAQUIER', -
table_name => 'TEST01', table_column => 'ID', chunk_size => 1000000);
 
PL/SQL PROCEDURE successfully completed.

If you mess up in creating chunks you can still use:

SQL> EXEC dbms_parallel_execute.drop_chunks(task_name => 'task01');
 
PL/SQL PROCEDURE successfully completed.

You can control chunks have been created with:

SQL> SELECT chunk_id, status, start_id, end_id
     FROM user_parallel_execute_chunks
     WHERE task_name='task01'
     ORDER BY chunk_id;
 
  CHUNK_ID STATUS                 START_ID     END_ID
---------- -------------------- ---------- ----------
         1 UNASSIGNED                    1    1000000
         2 UNASSIGNED              1000001    2000000
         3 UNASSIGNED              2000001    3000000
         4 UNASSIGNED              3000001    4000000
         5 UNASSIGNED              4000001    5000000
         6 UNASSIGNED              5000001    6000000
         7 UNASSIGNED              6000001    7000000
         8 UNASSIGNED              7000001    8000000
         9 UNASSIGNED              8000001    9000000
        10 UNASSIGNED              9000001   10000000
        11 UNASSIGNED             10000001   11000000
        12 UNASSIGNED             11000001   12000000
        13 UNASSIGNED             12000001   13000000
        14 UNASSIGNED             13000001   14000000
        15 UNASSIGNED             14000001   15000000
        16 UNASSIGNED             15000001   16000000
        17 UNASSIGNED             16000001   17000000
        18 UNASSIGNED             17000001   18000000
        19 UNASSIGNED             18000001   19000000
        20 UNASSIGNED             19000001   20000000
        21 UNASSIGNED             20000001   21000000
        22 UNASSIGNED             21000001   22000000
        23 UNASSIGNED             22000001   23000000
        24 UNASSIGNED             23000001   24000000
        25 UNASSIGNED             24000001   25000000
        26 UNASSIGNED             25000001   26000000
        27 UNASSIGNED             26000001   27000000
        28 UNASSIGNED             27000001   28000000
        29 UNASSIGNED             28000001   29000000
        30 UNASSIGNED             29000001   30000000
        31 UNASSIGNED             30000001   31000000
        32 UNASSIGNED             31000001   32000000
        33 UNASSIGNED             32000001   33000000
        34 UNASSIGNED             33000001   34000000
        35 UNASSIGNED             34000001   35000000
        36 UNASSIGNED             35000001   36000000
        37 UNASSIGNED             36000001   37000000
        38 UNASSIGNED             37000001   38000000
        39 UNASSIGNED             38000001   39000000
        40 UNASSIGNED             39000001   40000000
        41 UNASSIGNED             40000001   41000000
        42 UNASSIGNED             41000001   42000000
        43 UNASSIGNED             42000001   43000000
        44 UNASSIGNED             43000001   44000000
        45 UNASSIGNED             44000001   45000000
        46 UNASSIGNED             45000001   46000000
        47 UNASSIGNED             46000001   47000000
        48 UNASSIGNED             47000001   48000000
        49 UNASSIGNED             48000001   49000000
        50 UNASSIGNED             49000001   50000000
 
50 ROWS selected.

Now execute the task with code below. One important parameter is PARALLEL_LEVEL:

ALTER SYSTEM flush buffer_cache;
ALTER SYSTEM flush shared_pool;
SET serveroutput ON SIZE 999999
DECLARE
  start_time NUMBER;
  end_time NUMBER;
  sql_stmt VARCHAR2(32767);
BEGIN
  start_time:=dbms_utility.get_time;
  sql_stmt := 'update /* Yannick */ test01 set val=rpad(''D'',100,''D'') where id between :start_id and :end_id';
 
  dbms_parallel_execute.run_task(task_name => 'task01', sql_stmt => sql_stmt, language_flag => dbms_sql.native, parallel_level => 25);
 
  end_time:=dbms_utility.get_time;
  dbms_output.put_line('DBMS_PARALLEL_EXECUTE Update: ' || TO_CHAR((end_time - start_time)/100) || ' seconds');
END;
/

To know which jobs have been created for your task you need to get JOB_PREFIX column from USER_PARALLEL_EXECUTE_TASKS view and use this prefix with LIKE keyword to get job names list (not too easy to explain but below SQL is sel-explaining I hope):

SQL> col task_name FOR a10
SQL> col job_prefix FOR a15
SQL> SELECT task_name, status, job_prefix, parallel_level FROM user_parallel_execute_tasks;
 
TASK_NAME  STATUS              JOB_PREFIX      PARALLEL_LEVEL
---------- ------------------- --------------- --------------
task01     FINISHED            TASK$_442                   10
 
SQL> col job_name FOR a20
SQL> col run_duration FOR a15
SQL> SELECT job_name, status, run_duration
     FROM user_scheduler_job_run_details
     WHERE job_name LIKE (SELECT job_prefix || '%' FROM user_parallel_execute_tasks WHERE task_name='task01');
 
JOB_NAME             STATUS                         RUN_DURATION
-------------------- ------------------------------ ---------------
TASK$_728_5          SUCCEEDED                      +000 00:02:00
TASK$_728_7          SUCCEEDED                      +000 00:09:16
TASK$_728_8          SUCCEEDED                      +000 00:00:00
TASK$_728_9          SUCCEEDED                      +000 00:01:16
TASK$_728_10         SUCCEEDED                      +000 00:09:19
TASK$_728_2          SUCCEEDED                      +000 00:09:22
TASK$_728_3          SUCCEEDED                      +000 00:09:18
TASK$_728_6          SUCCEEDED                      +000 00:00:00
TASK$_728_4          SUCCEEDED                      +000 00:00:00
TASK$_728_1          SUCCEEDED                      +000 00:09:22
 
10 ROWS selected.

Even if the tasks are all in success you might have some chunks that have not been processed correctly:

SQL> SELECT chunk_id, status, start_id, end_id
     FROM user_parallel_execute_chunks
     WHERE task_name='task01'
     ORDER BY chunk_id;
 
  CHUNK_ID STATUS                 START_ID     END_ID
---------- -------------------- ---------- ----------
       241 PROCESSED                     1    1000000
       242 PROCESSED               1000001    2000000
       243 PROCESSED               2000001    3000000
       244 PROCESSED               3000001    4000000
       245 PROCESSED               4000001    5000000
       246 PROCESSED               5000001    6000000
       247 PROCESSED               6000001    7000000
       248 PROCESSED               7000001    8000000
       249 PROCESSED               8000001    9000000
       250 PROCESSED               9000001   10000000
       251 PROCESSED              10000001   11000000
       252 PROCESSED              11000001   12000000
       253 PROCESSED              12000001   13000000
       254 PROCESSED              13000001   14000000
       255 PROCESSED              14000001   15000000
       256 PROCESSED              15000001   16000000
       257 PROCESSED              16000001   17000000
       258 PROCESSED              17000001   18000000
       259 PROCESSED              18000001   19000000
       260 PROCESSED              19000001   20000000
       261 PROCESSED              20000001   21000000
       262 PROCESSED_WITH_ERROR   21000001   22000000
       263 PROCESSED              22000001   23000000
       264 PROCESSED_WITH_ERROR   23000001   24000000
       265 PROCESSED              24000001   25000000
       266 PROCESSED              25000001   26000000
       267 PROCESSED              26000001   27000000
       268 PROCESSED_WITH_ERROR   27000001   28000000
       269 PROCESSED              28000001   29000000
       270 PROCESSED_WITH_ERROR   29000001   30000000
       271 PROCESSED_WITH_ERROR   30000001   31000000
       272 PROCESSED              31000001   32000000
       273 PROCESSED_WITH_ERROR   32000001   33000000
       274 PROCESSED_WITH_ERROR   33000001   34000000
       275 PROCESSED_WITH_ERROR   34000001   35000000
       276 PROCESSED_WITH_ERROR   35000001   36000000
       277 PROCESSED              36000001   37000000
       278 PROCESSED              37000001   38000000
       279 PROCESSED_WITH_ERROR   38000001   39000000
       280 PROCESSED              39000001   40000000
       281 PROCESSED              40000001   41000000
       282 PROCESSED              41000001   42000000
       283 PROCESSED              42000001   43000000
       284 PROCESSED              43000001   44000000
       285 PROCESSED              44000001   45000000
       286 PROCESSED              45000001   46000000
       287 PROCESSED              46000001   47000000
       288 PROCESSED              47000001   48000000
       289 PROCESSED              48000001   49000000
       290 PROCESSED              49000001   50000000
 
50 ROWS selected.

You can try to understand why with ERROR_CODE and ERROR_MESSAGE columns of USER_PARALLEL_EXECUTE_CHUNKS view:

SQL> col error_message FOR a40 word_wrapped
SQL> SELECT error_code, error_message
     FROM user_parallel_execute_chunks
     WHERE chunk_id=273;
 
ERROR_CODE ERROR_MESSAGE
---------- ----------------------------------------
    -30036 ORA-30036: unable TO extend SEGMENT BY 8
           IN undo TABLESPACE 'UNDOTBS1'

And here you see the mistake I have done, I have tried to proceed the 50 chunks with 25 task running in parallel. It means I would need to have undo to store the update (without committing) of 25 millions rows. It is really far from the previous methods I have used. And to be honest it was not faster that the bulk method. I have refined the approach by creating chunks of 100,000 rows (so 500 chunks) and processing 10 in parallel and got 640.39 seconds…

With chunks of 50,000 (so 1000 chucks) and 20 jobs in parallel I got 972.92 seconds… This just to show that finding the good balance between chunks size and parallelism is not so easy and might require multiple tries…

Another cool query to see progress of your task is:

SQL> SELECT status, COUNT(*)
     FROM user_parallel_execute_chunks
     WHERE task_name='task01'
     GROUP BY status
    ORDER BY status
 
STATUS                 COUNT(*)
-------------------- ----------
ASSIGNED                      6
PROCESSED                   165
UNASSIGNED                  329

Over 7 runs (500 chunks of 100,000 rows and 10 jobs in parallel) I have got an average of 610 seconds which is again a 40% improvement versus bulk SQL. Of course this is really more complex to implement and you must be able to divide and conquer to implement parallel processing with DBMS_PARALLEL_EXECUTE package. On top of it as each processed chunks are committed it also helps to decrease your undo footprint. I would have loved to be aware of this package sooner when I have developed few BI PL/SQL processes…

At the end to clean the task use:

SQL> EXEC dbms_parallel_execute.drop_task('task01');
 
PL/SQL PROCEDURE successfully completed.

Raw results

The raw results I have gotten with the three methods:

Classical Bulk SQL DBMS_PARALLEL_EXECUTE
1784.23 1106.4 640.39
2195.67 1026.9 517.1
2306.09 870.39 640.56
1997.96 927.7 637.14
2196.34 968.7 631.4
2428.13 993.36 569.85
2115.6 919.14 565.35

References

This entry was posted in Oracle and tagged , . Bookmark the permalink.

2 thoughts on “PL/SQL performance tuning with bulk SQL/binding and parallelism

  1. Pierre-yves COLLE says:

    A big thanks to Yannick for this wonderful very clear, detailed and easy to understand and use article.

    Although I’m not a rookie guy in SQL and ORACLE, having spent multiple years here programming applications,
    I learned a great new feature that I was absolutely not aware of ORACLE and thanks to his valuable example I was able to test it

    Not only the value added of this ORACLE feature is well documented but also the example allow you to feel immediately its interest and not to stay just theoretical as it is the case many times in many other articles being on ORACLE or on other technology.

    BIG BIG THANKS AGAIN

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>