IT World https://blog.yannickjaquier.com RDBMS, Unix and many more... Mon, 27 Sep 2021 08:20:11 +0000 en-US hourly 1 https://wordpress.org/?v=5.8.1 DBMS_DATA_MINING package for Machine Learning inside the database https://blog.yannickjaquier.com/oracle/dbms_data_mining-package-for-machine-learning-inside-the-database.html https://blog.yannickjaquier.com/oracle/dbms_data_mining-package-for-machine-learning-inside-the-database.html#respond Tue, 28 Sep 2021 08:03:30 +0000 https://blog.yannickjaquier.com/?p=5243 Preamble Since the feature what was formerly known as Advanced Analytics and now called Machine Learning, Spatial and Graph has been made freely available: explained here, here or here it was time for me to make a try of this highly hype Machine Learning feature. The Machine Learning part is accessible through the DBMS_DATA_MINING PL/SQL […]

The post DBMS_DATA_MINING package for Machine Learning inside the database appeared first on IT World.

]]>

Table of contents

Preamble

Since the feature what was formerly known as Advanced Analytics and now called Machine Learning, Spatial and Graph has been made freely available: explained here, here or here it was time for me to make a try of this highly hype Machine Learning feature. The Machine Learning part is accessible through the DBMS_DATA_MINING PL/SQL package. No doubt that with the number of free Machine Learning tools out there it was not sustainable for Oracle to make this option non free…

Needless to say that it was really hard for me to get an idea of what to do and even harder to reach even a small objective. We have all read lots and lots of articles on how machine learning is helping to analyze medical images for tumor research, fraud detection or even speech recognition. Here as we are in an Oracle database the source of information would be ideally structured (versus non-structure as for images) but again it was clearly not a piece of cake to organize my mind and find where to start.

Then I remembered a nice web site called Kaggle and I have tried to find on it a popular dataset with few tasks associated and ideally people who have kindly submitted and shared their work to be able to compare if I am able to reach a similar result. On Kaggle people mainly work with Python in what we call notebooks. The dataset I have chosen is the Water Quality (https://www.kaggle.com/adityakadiwal/water-potability) one. The associated task is:

Predict if water is safe for Human consumption:
Create a model to determine if the sample tested from the water body is fit for human consumption or not.
This dataset may require you to treat missing value if any and check for data imbalance.

My testing has been luckily done on a powerful bare metal test server made of 12 cores and 64GB of RAM running Red Hat Enterprise Linux Server release 7.8 (Maipo). My Oracle test database is a pluggable database (pdb1) running Oracle 19.12 (July 2021 Release Update).

Loading dataset

Download and transfer the csv file to your database server. When you transfer this dataset to your database server you might have to convert it with dos2unix tool to have strange hidden characters at the end of each line (you can spend a day on a stupid thing like this). Then put the file in any folder you like (the Unix account used to run the database must be able to read it) and load it as an external table with a code like below.

I start by creating an account for me and a directory and giving full privileges on this directory to my account:

SQL> create user yjaquier identified by "secure_password";

User created.

SQL> grant dba to yjaquier;

Grant succeeded.

SQL> create or replace directory directory01 as '/home/oracle/';

Directory created.

SQL> grant read,write on directory directory01 to yjaquier;

Grant succeeded.

Finally create the external table with:

SQL> connect yjaquier/"secure_password"@pdb1
Connected.

SQL>
create table water_potability_csv (
  ph number,
  hardness number,
  solids number,
  chloramines number,
  sulfate number,
  conductivity number,
  organic_carbon number,
  trihalomethanes number,
  turbidity number,
  potability number(1)
)
organization external
(
  type oracle_loader
  default directory directory01
  access parameters
  (
    records delimited by newline skip 1 logfile 'water_potability.log' badfile 'water_potability.bad' discardfile 'water_potability.dsc'
    fields terminated by ','
    missing field values are null
  )
  location ('water_potability.csv')
)
reject limit unlimited;

Table created.

You can control it has been well loaded by directly selecting the external table:

SQL> set lines 200
SQL> select count(*) from water_potability_csv;

  COUNT(*)
----------
      3276

SQL> select * from water_potability_csv fetch first 5 rows only;

        PH   HARDNESS     SOLIDS CHLORAMINES    SULFATE CONDUCTIVITY ORGANIC_CARBON TRIHALOMETHANES  TURBIDITY POTABILITY
---------- ---------- ---------- ----------- ---------- ------------ -------------- --------------- ---------- ----------
           204.890455  20791.319  7.30021187 368.516441   564.308654     10.3797831      86.9909705 2.96313538          0
3.71608008 129.422921 18630.0579  6.63524588              592.885359     15.1800131      56.3290763 4.50065627          0
8.09912419 224.236259 19909.5417   9.2758836              418.606213     16.8686369      66.4200925 3.05593375          0
8.31676588 214.373394 22018.4174  8.05933238 356.886136   363.266516     18.4365245      100.341674 4.62877054          0
9.09222346 181.101509 17978.9863  6.54659997 310.135738   398.410813     11.5582794      31.9979927 4.07507543          0

You can also control in the filesystem of the directory we created above that there is no water_potability.bad file and control in water_potability.log that everything goes well (confirmed by the number of loaded rows in my case).

For the case_id_column_name parameter of the dbms_data_mining.create_model procedure I have realized that I needed to add an sequence id kind of column on my dataset table:

SQL> create table water_potability (
  id NUMBER GENERATED ALWAYS AS IDENTITY,
  ph number,
  hardness number,
  solids number,
  chloramines number,
  sulfate number,
  conductivity number,
  organic_carbon number,
  trihalomethanes number,
  turbidity number,
  potability number(1)
);

SQL> insert into water_potability(ph, hardness, solids, chloramines, sulfate, conductivity, organic_carbon, trihalomethanes, turbidity, potability)
     select * from water_potability_csv;

3276 rows created.

SQL> commit;

Commit complete.

Remark:
If you need more advanced transformation Oracle has implemented this in DBMS_DATA_MINING_TRANSFORM package.

Dataset queries and charts

The multiple answers of the Kaggle thread helps to see that we can almost get same repartition results in SQL (chartless in SQL obviously):

SQL> select decode(potability,0,'Not potable','Potable') as potability,
     round(ratio_to_report(count(*)) over ()*100) as percentage
     from water_potability group by potability;

POTABILITY  PERCENTAGE
----------- ----------
Potable             39
Not potable         61

To display few charts you could do in Python (Matplotlib) with cx_Oracle connector or use free Power BI Desktop that I recently installed to test MariaDB ColumnStore (https://blog.yannickjaquier.com/mysql/mariadb-columnstore-installation-and-testing-part-1.html). I connected to my database using ODBC and EZConnect and imported in Power BI Desktop my water_potability table…

Potability samples repartition:

dbms_data_mining01
dbms_data_mining01

Hardness repartition:

dbms_data_mining02
dbms_data_mining02

The dataset has deliberately wrong figures and part of the task is to clean figures to replace the null values. One traditional approach is to replace those null values by median value which is not a complex task in Python with Pandas. I could also do this in SQL with queries like:

SQL> select median(ph) as median_ph from water_potability where potability=0 and ph is not null;

 MEDIAN_PH
----------
7.03545552

But Oracle in their DBMS_DATA_MINING package has their own automatic data cleaning algorithms that I’m planning to activate so skipping this task for now…

Model creation and testing

You need an option table as described in official documentation:

SQL> create table model_settings(setting_name varchar2(30), setting_value varchar2(30));

Table created.

One typical activity is to split the dataset in training sample and testing sample, not to train your data model on the testing sample. One rule of thumb is to split in 80%/20% and do the training on the 80% and test the accuracy of your model on the remaining 20%:

SQL> create table water_potability_training
     as select * from water_potability SAMPLE (80);

Table created.

SQL> create table water_potability_testing
     as select * from water_potability
     minus select * from water_potability_training;

Table created.

SQL> select count(*) from water_potability_training;

  COUNT(*)
----------
      2619

SQL> select count(*) from water_potability_testing;

  COUNT(*)
----------
       657

SQL> select count(*) from water_potability;

  COUNT(*)
----------
    3276

When playing with different models and their associated parameters if you are too optimistic then you can end up with a very long running time for the CREATE_MODEL procedure. This running time was also serial (using one thread) so if you wish to use more power of your database server (see conclusion for the pros and cons) I have also changed the attributes of my table to allow parallel operation and put in in memory with:

SQL> alter table water_potability_training parallel inmemory;

Table altered.

Remark:
I recall that InMemory paid Enterprise option is free if you use less than 16GB. To be sure you are compliant you can even set the limit to 16GB with inmemory_size parameter.

Choose your model and set its parameters. Be very careful all the insertion must be in a PL/SQL block or you will get error message like (ORA-06553: PLS-221: ‘ALGO_NAME’ is not a procedure or is undefined). I have chosen the Random Forest algorithm as this is the one that provided better result from people who submitted a task answer on Kaggle (it would have been difficult alone to decide which one to choose). When using Random Forest algorithm you can also tweak the Decision Tree algorithm parameters:

SQL>
begin
  -- Clean the table before starting (TRUNCATE cannot be used in PL/SQL)
  delete from model_settings;
  -- Choose your model
  insert into model_settings values(dbms_data_mining.algo_name, dbms_data_mining.algo_random_forest);
  -- Automatic data preparation activation
  insert into model_settings values(dbms_data_mining.prep_auto, dbms_data_mining.prep_auto_on);
  -- Missing value will be replaced by mean value
  insert into model_settings values (dbms_data_mining.odms_missing_value_treatment, dbms_data_mining.odms_missing_value_mean_mode);
  -- Algorithm Settings: Random Forest
  insert into model_settings values (dbms_data_mining.rfor_mtry, 0);
  insert into model_settings values (dbms_data_mining.rfor_num_trees, 100);
  insert into model_settings values (dbms_data_mining.rfor_sampling_ratio, 1);
  insert into model_settings values (dbms_data_mining.tree_term_max_depth, 50);
  commit;
end;
/

PL/SQL procedure successfully completed.

SQL> select * from model_settings;

SETTING_NAME                   SETTING_VALUE
------------------------------ ------------------------------
ALGO_NAME                      ALGO_RANDOM_FOREST
PREP_AUTO                      ON
ODMS_MISSING_VALUE_TREATMENT   ODMS_MISSING_VALUE_MEAN_MODE
RFOR_MTRY                      0
RFOR_NUM_TREES                 100
RFOR_SAMPLING_RATIO            1

6 rows selected.

Finally create the model, it will also training it so the execution time is linked to chosen algorithm and its parameters:

-- create the model using the specified settings 
begin
  dbms_data_mining.create_model(
    model_name          => 'water_potability_model',
    mining_function     => dbms_data_mining.classification,
    data_table_name     => 'water_potability_training',
    case_id_column_name => 'id',
    target_column_name  => 'potability',
    settings_table_name => 'model_settings');
end;
/

PL/SQL procedure successfully completed.

If you plan to make multiple test by playing with model parameters you must drop the model first before creating a new one:

SQL> exec dbms_data_mining.drop_model('water_potability_model');

PL/SQL procedure successfully completed.

Remark:
The procedure is also creating plenty of DM$xxWATER_POTABILITY_MODEL tables.

I have finally used below query to apply my model on the 20% sample of testing data. The best accuracy I have been able to get is 65% with the Random Forest algorithm:

SQL>
select
  predicted,
  round(ratio_to_report(count(*)) over ()*100) as percentage
from (
  select
    case when potability=predicted_potability then 'Good' else 'Bad' end as predicted
  from (
    select 
      t.*,
      prediction (water_potability_model using *) predicted_potability
    from water_potability_testing t)
  )
group by predicted;

PRED PERCENTAGE
---- ----------
Good         65
Bad          35

This is here one huge difficulty I have found (at least for me) is which algorithm to choose ? You might not have time and/or energy/resource to test them all. Then when you have chosen your algorithm when you paly with its parameters (Random Forest algorithm for me) I have also experimented that better is enemy of good enough as each time I have tried to add more trees or fraction of the training data to be randomly sampled it has given a worse result…More or less each time it has ended up with a worst accuracy but with a drastic increase in CPU consumption.

Last but not least, I was honestly expecting an higher accuracy as people on Kaggle are going up to 80%. Maybe I doing something wrong or my whole exercise is wrong. Do not hesitate to comment if you see something stupid in my logic…

Conclusion

I don’t really know what to think with this Oracle database Machine Learning feature. In one hand Oracle has made something easy to use and you obviously use it with a language you already know very well: SQL. Of course the Machine Learning language is Python so if you are in this domain Python is most probably your best friend.

On the other hand you do machine learning at the cost of the Oracle database while doing Python and cx_Oracle is almost *free* (separate server, even a virtual machine, with all free components).

I have taken a screenshot of my server while creating a model and it can be like this for more than 10 minutes from my simple trial dataset (if you keep the default algorithm parameters you will not have this situation):

dbms_data_mining03
dbms_data_mining03

Then of course the feature has become free and, for me, it is clearly a must if Oracle expect people to use it as the free offer is really generous (Scikit-learn, tensorflow, Spark ML, …). Before deciding to use it you have to balance the additional CPU consumption you will make on your database server versus having a dedicated server with Python and offloading your figures to it… I would say that if your dataset is huge and computing a model is fast then this option is interesting. You can also live see the result of your model applied to your figures with a simple SQL statement…

DBMS_PREDICTIVE_ANALYTICS all in one bonus package

I have also tried with DBMS_PREDICTIVE_ANALYTICS package that is more easy to implement for noob on Machine Learning like me. if you have already use the DBMS_DATA_MINING package then this one is much simpler to implement but you have obviously less control over it.

I start with the EXPLAIN procedure that I initially did not even consider but at the end it provides interesting information on which columns Oracle will use to make a prediction of your target column. To not have the required added ID column you could also issue it on the WATER_POTABILITY_CSV external table if you like:

SQL> exec dbms_predictive_analytics.explain(data_table_name => 'water_potability_training', explain_column_name => 'potability', result_table_name => 'water_potability_explain');

PL/SQL procedure successfully completed.

SQL> set lines 200
SQL> col attribute_name for a15
SQL> col attribute_subname for a15
SQL> select * from water_potability_explain;

ATTRIBUTE_NAME  ATTRIBUTE_SUBNA EXPLANATORY_VALUE       RANK
--------------- --------------- ----------------- ----------
ID                                     .552324937          1
SULFATE                                .007351846          2
PH                                              0          3
TRIHALOMETHANES                                 0          3
TURBIDITY                                       0          3
CONDUCTIVITY                                    0          3
CHLORAMINES                                     0          3
ORGANIC_CARBON                                  0          3
HARDNESS                                        0          3
SOLIDS                                          0          3

10 rows selected.

It’s a bit disturbing, and if I understand it well, but apparently only SULFATE column is taken into account to make a prediction. This might explain the poor result I’ll get…

Then I start with a similar execution as with DBMS_DATA_MINING package. Here no model to choose and I would even not need to split my table in training and testing data sets:

SQL> set serveroutput on size 999999
SQL>
DECLARE 
    v_accuracy NUMBER(10,9); 
BEGIN 
    DBMS_PREDICTIVE_ANALYTICS.PREDICT( 
        accuracy             => v_accuracy, 
        data_table_name      => 'water_potability_training', 
        case_id_column_name  => 'id', 
        target_column_name   => 'potability', 
        result_table_name    => 'water_potability_predict_result'); 
    DBMS_OUTPUT.PUT_LINE('Accuracy = ' || v_accuracy); 
END; 
/

Accuracy = .055282742

PL/SQL procedure successfully completed.

Remark:
The returned accuracy is clearly not good…

To check the predicted value versus the real one:

SQL> col probability for 9.999999999
SQL> select a.potability, b.prediction,b.probability
     from water_potability a, water_potability_predict_result b
     where a.id=b.id
     fetch first 10 rows only;

POTABILITY PREDICTION  PROBABILITY
---------- ---------- ------------
        0          0   .610151794
        0          0   .610151734
        0          1   .389848346
        0          1   .389848362
        0          1   .389848299
        0          1   .389848292
        0          1   .389848368
        0          1   .389848334
        0          1   .389848339
        0          0   .610151736

10 rows selected.  

Finally same query as DBMS_DATA_MINING package to see that result is not really good:

SQL>
select
  predicted,
  round(ratio_to_report(count(*)) over ()*100) as percentage
from (
  select
    case when potability=prediction then 'Good' else 'Bad' end as predicted
  from (
    select
      a.potability,
      b.prediction
    from water_potability_training a, water_potability_predict_result b
    where a.id = b.id)
  )
group by predicted;

PRED PERCENTAGE
---- ----------
Good         52
Bad          48

If I query eliminating where the probability is not good the accuracy improve and we almost reach previous result but we eliminated few predictions:

SQL>
select
  predicted,
  round(ratio_to_report(count(*)) over ()*100) as percentage
from (
  select
    case when potability=prediction then 'Good' else 'Bad' end as predicted
  from (
    select
      a.potability,
      b.prediction
    from water_potability_training a, water_potability_predict_result b
    where a.id = b.id and probability > 0.5)
  )
group by predicted;

PRED PERCENTAGE
---- ----------
Bad          36
Good         64

References

The post DBMS_DATA_MINING package for Machine Learning inside the database appeared first on IT World.

]]>
https://blog.yannickjaquier.com/oracle/dbms_data_mining-package-for-machine-learning-inside-the-database.html/feed 0
PostgreSQL graphical monitoring tools comparison https://blog.yannickjaquier.com/postgresql/postgresql-graphical-monitoring-tools-comparison.html https://blog.yannickjaquier.com/postgresql/postgresql-graphical-monitoring-tools-comparison.html#respond Sat, 28 Aug 2021 13:08:49 +0000 https://blog.yannickjaquier.com/?p=5187 Preamble Monitoring tools is part of our journey to slowly but constantly increasing our internal expertise on PostgreSQL. So far this is not part of our IT standards but fact is that that few applications already started to use it. In parallel of PostgreSQL community edition you might have heard about Enterprise DB that propose […]

The post PostgreSQL graphical monitoring tools comparison appeared first on IT World.

]]>

Table of contents

Preamble

Monitoring tools is part of our journey to slowly but constantly increasing our internal expertise on PostgreSQL. So far this is not part of our IT standards but fact is that that few applications already started to use it.

In parallel of PostgreSQL community edition you might have heard about Enterprise DB that propose a commercial offer on PostgreSQL said to be compatible with Oracle. With the clear goal to reduce your Oracle fees in migrating to EnterpriseDB. So far this is not yet a goal for us and we just aim to stick to community PostgreSQL and increase our internal knowledge before going further…

So I have decided to start to have a look and increase my knowledge on enterprise features that are key for us:

This blog post will be on PostgreSQL monitoring tools as this is the most appealing part and is the part that helps you not to be blind in front of problems…

I plan to start small this blog post and enrich it if I test new monitoring tools in future. The plan is also to compare them versus the commercial offer of EnterpriseDB: PostgreSQL Enterprise Manager. If trial keys are available I also plan to add other commercial products. I will also mainly focus on on-premise product as our databases are mostly on-premise.

Looking at the free and open source products available out there I tend to say that it sounds difficult to me for paid competition…

Preference so far:

  1. PGWatch
  2. Percona Monitoring and Management
  3. Postgres Enterprise Manager
  4. OmniDB
  5. pgAdmin

pgAdmin

PgAdmin that is one of the most famous free and open source PostgreSQL monitoring tool.

Step zero, if like me you are behind a corporate proxy, is to configure your Python pip to go on internet by creating the /etc/pip.conf file similar to below (I have also decided to use a dedicated Linux account, pgadmin, to run it):

[pgadmin@server ~]$ cat /etc/pip.conf
[global]
extra-index-url=https://www.piwheels.org/simple
proxy = http://account:password@proxy_server.domain.com:proxy_port/
trusted-host = pypi.python.org pypi.org www.piwheels.org  files.pythonhosted.org

Create and activate a Python vitual environment with (Install Python 3 on your server using your Linux distribution repository):

[pgadmin@server ~]$ cd /www/pgadmin/
[pgadmin@server pgadmin]$ python3 -m venv pgadmin4
[pgadmin@server pgadmin]$ source pgadmin4/bin/activate

Most probably you will have to upgrade pip if you get below error message:

WARNING: You are using pip version 21.1.2; however, version 21.1.3 is available.
You should consider upgrading via the '/www/pgadmin/pgadmin4/bin/python3 -m pip install --upgrade pip' command.

Use:

(pgadmin4) [pgadmin@server ~]$ pip install --upgrade pip

Create and give ownership to your pgAdmin Linux account to directory /var/log/pgadmin.

Finally install pgAdmin 4 with (current version as a time of writing this article is 5.4):

pip install pgadmin4

Execute it with:

(pgadmin4) [pgadmin@server pgadmin]$ nohup pgadmin4 > pgadmin4.out &
[1] 22676

To generate a bit of activity and understand how it works I have initialized pgBench. I have decided to create a dedicated database for pgBench, this must be done upfront using:

create database pgbenchdb;

Then I created the pgbench datamodel using (you must plan for 5GB data storage):

pgbench --host=server2.domain.com --port=5433 --user=postgres --initialize --scale=100 pgbenchdb

If you mess up or want to delete the pgbench table use:

pgbench --host=server2.domain.com --port=5433 --user=postgres --initialize --init-steps=d pgbenchdb

Most probably you will fill the WAL directory so use something like (never do this is production ! But here I assume you are using a test server where you don’t care about recovery). First display mode to see what PosgreSQL would do:

[postgres@server ~]$ pg_archivecleanup -d -n /postgres/13/data/pg_wal/ 0000000300000000000000D0
pg_archivecleanup: keeping WAL file "/postgres/13/data/pg_wal//0000000300000000000000D0" and later
/postgres/13/data/pg_wal//0000000300000000000000B9
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000B9" would be removed
/postgres/13/data/pg_wal//0000000300000000000000BA
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000BA" would be removed
/postgres/13/data/pg_wal//0000000300000000000000BB
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000BB" would be removed
/postgres/13/data/pg_wal//0000000300000000000000BC
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000BC" would be removed
/postgres/13/data/pg_wal//0000000300000000000000BD
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000BD" would be removed
/postgres/13/data/pg_wal//0000000300000000000000BE
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000BE" would be removed
/postgres/13/data/pg_wal//0000000300000000000000BF
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000BF" would be removed
/postgres/13/data/pg_wal//0000000300000000000000C0
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000C0" would be removed
/postgres/13/data/pg_wal//0000000300000000000000C1
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000C1" would be removed
/postgres/13/data/pg_wal//0000000300000000000000C2
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000C2" would be removed
/postgres/13/data/pg_wal//0000000300000000000000C3
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000C3" would be removed
/postgres/13/data/pg_wal//0000000300000000000000C4
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000C4" would be removed
/postgres/13/data/pg_wal//0000000300000000000000C5
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000C5" would be removed
/postgres/13/data/pg_wal//0000000300000000000000C6
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000C6" would be removed
/postgres/13/data/pg_wal//0000000300000000000000C7
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000C7" would be removed
/postgres/13/data/pg_wal//0000000300000000000000C8
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000C8" would be removed
/postgres/13/data/pg_wal//0000000300000000000000C9
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000C9" would be removed
/postgres/13/data/pg_wal//0000000300000000000000CA
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000CA" would be removed
/postgres/13/data/pg_wal//0000000300000000000000CB
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000CB" would be removed
/postgres/13/data/pg_wal//0000000300000000000000CC
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000CC" would be removed
/postgres/13/data/pg_wal//0000000300000000000000CD
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000CD" would be removed
/postgres/13/data/pg_wal//0000000300000000000000CE
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000CE" would be removed
/postgres/13/data/pg_wal//0000000300000000000000CF
pg_archivecleanup: file "/postgres/13/data/pg_wal//0000000300000000000000CF" would be removed

Then the command to delete:

[postgres@server ~]$ pg_archivecleanup -d /postgres/13/data/pg_wal/ 0000000300000000000000D0
pg_archivecleanup: keeping WAL file "/postgres/13/data/pg_wal//0000000300000000000000D0" and later
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000B9"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000BA"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000BB"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000BC"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000BD"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000BE"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000BF"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000C0"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000C1"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000C2"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000C3"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000C4"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000C5"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000C6"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000C7"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000C8"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000C9"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000CA"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000CB"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000CC"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000CD"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000CE"
pg_archivecleanup: removing file "/postgres/13/data/pg_wal//0000000300000000000000CF"

Finally issue a benchmark using:

pgbench --host=server2.domain.com --port=5433 --user=postgres --client=10 --jobs=2 --transactions=10000 pgbenchdb

We can see that pgAdmin is providing interesting graphs of your server or individual database performance:

pgadmin01
pgadmin01

Pgadmin is also a neat graphical query editor:

pgadmin02
pgadmin02

Overall nice monitoring tools even if a bit complex to navigate in menu and options. The chart part could be more exhaustive and you cannot add your own charts…

Postgres Enterprise Manager

This tool is not free and you need an EnterpriseDB subscription to use it. I have created a trial account and you can test the tool for 60 days. For this I have used two virtual machines running Oracle Linux 8. One for the Postgres Enterprise Manager (PEM) repository and one for the client PostgreSQL Instance (deployed with a PEM agent).

Postgres Enterprise Manager Server

My PEM Server will be my first virtual machine called server1.domain.com (192.168.56.101). I have started by creating the PostgreSQL 13 repository database in /postgres/13/data directory and created a service with below, taken from official documentation, startup file:

[root@server1 ~]# cat /etc/systemd/system/postgresql.service
[Unit]
Description=PostgreSQL database server
Documentation=man:postgres(1)

[Service]
Type=notify
User=postgres
ExecStart=/usr/pgsql-13/bin/postgres -D /postgres/13/data
ExecReload=/bin/kill -HUP $MAINPID
KillMode=mixed
KillSignal=SIGINT
TimeoutSec=0

[Install]
WantedBy=multi-user.target

Reload systemd daemon with:

systemctl daemon-reload

Now you can use systemctl stopstart/status postgresql (PEM Server will do it too):

[root@server1 ~]# systemctl status postgresql
● postgresql.service - PostgreSQL database server
   Loaded: loaded (/etc/systemd/system/postgresql.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2021-07-05 12:43:56 CEST; 12min ago
     Docs: man:postgres(1)
 Main PID: 9781 (postgres)
    Tasks: 17 (limit: 49502)
   Memory: 144.9M
   CGroup: /system.slice/postgresql.service
           ├─ 9781 /usr/pgsql-13/bin/postgres -D /postgres/13/data
           ├─ 9782 postgres: logger
           ├─ 9784 postgres: checkpointer
           ├─ 9785 postgres: background writer
           ├─ 9786 postgres: walwriter
           ├─ 9787 postgres: autovacuum launcher
           ├─ 9788 postgres: stats collector
           ├─ 9789 postgres: logical replication launcher
           ├─ 9850 postgres: agent1 pem 127.0.0.1(60830) idle
           ├─ 9871 postgres: agent1 pem 127.0.0.1(60832) idle
           ├─ 9898 postgres: agent1 pem 127.0.0.1(60836) idle
           ├─ 9904 postgres: postgres postgres 127.0.0.1(60838) idle
           ├─ 9910 postgres: agent1 pem 127.0.0.1(60840) idle
           ├─ 9919 postgres: agent1 pem 127.0.0.1(60842) idle
           ├─10358 postgres: postgres pem 127.0.0.1(60944) idle
           ├─10359 postgres: postgres pem 127.0.0.1(60946) idle
           └─10360 postgres: postgres pem 127.0.0.1(60948) idle

Jul 05 12:43:56 server1.domain.com systemd[1]: Starting PostgreSQL database server...
Jul 05 12:43:56 server1.domain.com postgres[9781]: 2021-07-05 12:43:56.615 CEST [9781] LOG:  redirecting log output to logging collector process
Jul 05 12:43:56 server1.domain.com postgres[9781]: 2021-07-05 12:43:56.615 CEST [9781] HINT:  Future log output will appear in directory "log".
Jul 05 12:43:56 server1.domain.com systemd[1]: Started PostgreSQL database server.

To be able to connect locally (PEM server access to 127.0.0.1) and from remote agent I have changed in postgresql.conf:

listen_addresses = 'localhost,server1.domain.com'

And in pg_hba.conf:

host    all             postgres             0.0.0.0/0            trust

As instructed install the EDB repository with:

dnf -y install https://yum.enterprisedb.com/edbrepos/edb-repo-latest.noarch.rpm

And change in /etc/yum.repos.d/edb.repo username and password with provided EDB information. Actually this is not the account you use to connect to EDB website that has to be used but the one provided when you go in your profile (I did that stupid mistake so sharing). Click on the eye icon to read the password:

pem01
pem01

Install PEM server with:

dnf install edb-pem

The interactive configuration tool is /usr/edb/pem/bin/configure-pem-server.sh. If for any reason you want to restart the configuration from scratch the configuration file is (not documented):

[root@server1 ~]# cat /usr/edb/pem/share/.install-config
PEM_INSTALLATION_TYPE=1
PG_INSTALL_PATH=/usr/pgsql-13
SUPERUSER=postgres
HOST=127.0.0.1
PORT=5432
AGENT_CERTIFICATE_PATH=/root/.pem/
PEM_PYTHON=python3
PEM_APP_HOST=
WEB_PEM_CONFIG=/usr/edb/pem/web/config_setup.py
CIDR_ADDR=0.0.0.0/0
DB_UNIT_FILE=postgresql
PEM_SERVER_SSL_PORT=8443

Once you have answered to the few question the PEM server is configured. So far I have not understood on how to stop/start/status the PEM server process and each time I use the configure-pem-server.sh script to start it. Not very convenient…

Postgres Enterprise Manager Agent

My PEM Agent server will be my second virtual machine called server2.domain.com (192.168.56.102). Same as the server part you need to configure the EDB repository and insert inside the repository file your EDB account and password. The only package you have to install is the PEM Agent:

dnf install edb-pem-agent

I have also configure on this client server a prostgreSQL instance where I have installed pgbench repository to generate some workload…

On this client node register the agent with the PostgreSQL repository instance. This is why I had to configure the PostgreSQL repository instance to accept connection from remote clients:

/usr/edb/pem/agent/bin/pemworker --register-agent

Finally you control PEM Agent with systemd:

systemctl start pemagent

Then this is where it is not crystal clear to me on how you need to add PEM Agent and PEM client servers to your repository. For me PEM Agent should be added automatically and when you must add them manually:

pem02
pem02

You do not see the databases list information and you need to add the server a second time (I added it in a specific group I created upfront) not using the PEM Agent sheet but the Connection one… Most probably I’m doing somethign wrong:

pem03
pem03

The tool has also a graphical query interface:

pem04
pem04

Overall the tool is quite complete with plenty of dashboard, alerts and probes (a check) that you can also customize and create on your own. If you have a magical query you can put everything in place to create a dashboard or an alert based on it.

Remark:
One point, same as me, that might not come immediately is the dashboard menu that is changing related to where you point on left menu (server, database, schema).

This is clearly one step beyond PgAdmin but it also takes the bad point of its ancestor. To be honest, overall, I have not been really impressed by the tool (also taking into account that the tool is not free). The tool is not bad but I was clearly expecting something much modern and easy to use for the UI. One nice added feature versus PgAdmin is the custom probes, alert and charts !! I would need to go deeper as I might have not understood it well…

OmniDB

Download the server package for your operating system:

omnidb01
omnidb01

And for my Oracle Linux 8 virtual machine installation has been as simple as:

dnf install omnidb-server-3.0.3b_linux_x86_64.rpm

To connect remotely I had to change the listen address at /root/.omnidb/omnidb-server/config.py configuration file. Execute the omniDB server web interface with:

[root@server1 ~]# omnidb-server
Running database migrations...
Operations to perform:
  Apply all migrations: OmniDB_app, admin, auth, contenttypes, sessions, social_django
Running migrations:
  No migrations to apply.
Starting OmniDB server...
Checking port availability...
Starting server OmniDB 3.0.3b at 192.168.56.101:8000.
Open OmniDB in your favorite browser
Press Ctrl+C to exit

In the web interface (default account is admin/admin) add a new PostgreSQL server with (other database flavor are available):

omnidb02
omnidb02

The tool has a graphical query part (all have more or less):

omnidb03
omnidb03

And a monitoring chart part:

omnidb04
omnidb04

Charts are neat (I really like them: really modern !) and the look and feel is really Web 2.0 but you cannot add your own custom charts while PEM has this capability…

Percona Monitoring and Management

This free multi-databases monitoring tool is a free offer from Percona. It is based on the traditional server/agent model and based on Grafana. To have already played a bit with it I can already say that the look and feel is pretty neat.

Server

For the server part of Percona Monitoring and Management (PMM) you have to download a provided Docker image. On my Oracle Linux 8 virtual box I have decided to use Podman (similar to Docker but daemon less, open source and Linux native tool). The nice thing to transition to Podman is that commands are all the same…

Download Docker PMM server image with podman pull command. I had to configure my corporate proxy by setting HTTPS_PROXY environment variable and had also to add my proxy certificate. I am not re-entering into details as we have seen this already with docker and it’s almost the same with Podamn (link):

[root@server1 ~]# podman pull percona/pmm-server:2
✔ docker.io/percona/pmm-server:2
Trying to pull docker.io/percona/pmm-server:2...
  Get "https://registry-1.docker.io/v2/": x509: certificate signed by unknown authority
Error: Error initializing source docker://percona/pmm-server:2: error pinging docker registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": x509: certificate signed by unknown authority

With proxy and certificates configured:

[root@server1 tmp]# podman pull percona/pmm-server:2
✔ docker.io/percona/pmm-server:2
Trying to pull docker.io/percona/pmm-server:2...
Getting image source signatures
Copying blob 178efec65a21 done
Copying blob 2d473b07cdd5 done
Copying config 82d29be43d done
Writing manifest to image destination
Storing signatures
82d29be43d66377922dcb3b1cabe8e2cb5716a3b9a76bab8791736e465ba50be

Once PMM is installed create a volume with

[root@server1 tmp]# podman create --volume /srv --name pmm-data percona/pmm-server:2 /bin/true
4dcc7c55603d01f4842ed524f5b7d983a67e16ff3d5a42dc84691d70c27eeba4

Run the container with:

[root@server1 tmp]# podman run --detach --restart always --publish 443:443 --volumes-from pmm-data --name pmm-server percona/pmm-server:2
5f5c4a6468f389951d1a44bfbf0f5492050c375d1bddcd2ab7fea25bcc791f45
[root@server1 tmp]# podman container list
CONTAINER ID  IMAGE                           COMMAND               CREATED         STATUS             PORTS                 NAMES
5f5c4a6468f3  docker.io/percona/pmm-server:2  /opt/entrypoint.s...  27 seconds ago  Up 25 seconds ago  0.0.0.0:443->443/tcp  pmm-server

Then you can access to https://192.168.56.101 (for me as I’m accessing my virtual server from my desktop), default login is admin/admin and you will immediately pormpted to change it.

Client

For the client I have chosen the rpm installation as it is by far the simplest to use. Download the rpm on your client server and install it with:

[root@server2 tmp]# dnf install pmm2-client-2.19.0-6.el8.x86_64.rpm
Last metadata expiration check: 0:45:32 ago on Mon 12 Jul 2021 11:05:40 AM CEST.
Dependencies resolved.
=========================================================================================================================================================================================================================================
 Package                                                  Architecture                                        Version                                                    Repository                                                 Size
=========================================================================================================================================================================================================================================
Installing:
 pmm2-client                                              x86_64                                              2.19.0-6.el8                                               @commandline                                               43 M

Transaction Summary
=========================================================================================================================================================================================================================================
Install  1 Package

Total size: 43 M
Installed size: 154 M
Is this ok [y/N]: y
Downloading Packages:
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                                                                                                 1/1
  Running scriptlet: pmm2-client-2.19.0-6.el8.x86_64                                                                                                                                                                                 1/1
  Installing       : pmm2-client-2.19.0-6.el8.x86_64                                                                                                                                                                                 1/1
  Running scriptlet: pmm2-client-2.19.0-6.el8.x86_64                                                                                                                                                                                 1/1
  Verifying        : pmm2-client-2.19.0-6.el8.x86_64                                                                                                                                                                                 1/1

Installed:
  pmm2-client-2.19.0-6.el8.x86_64

Complete!
[root@server2 tmp]# pmm-admin --version
ProjectName: pmm-admin
Version: 2.19.0
PMMVersion: 2.19.0
Timestamp: 2021-06-30 11:31:50 (UTC)
FullCommit: 33d4f4a11ec6c46204d58e6ff6e08ad5742c8ae2

Register with the server repository with:

[root@server2 ~]# pmm-admin config --server-insecure-tls --server-url=https://admin:admin@192.168.56.101:443
Checking local pmm-agent status...
pmm-agent is running.
Registering pmm-agent on PMM Server...
Registered.
Configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml updated.
Reloading pmm-agent configuration...
Configuration reloaded.
Checking local pmm-agent status...
pmm-agent is running.

On your client PostgreSQL instance create a pmm account (chose a strong password, not like me):

postgres=# CREATE USER pmm WITH SUPERUSER ENCRYPTED PASSWORD 'pmm';
CREATE ROLE

And update pg_hba.conf file to be able to connect with pmm account specifying a password:

[postgres@server2 data]$ grep local pg_hba.conf | grep -v "^#"
local   all             pmm                                     md5
local   replication     all                                     trust
local   all             all                                     trust

I have chosen to use pg_stat_statements for the monitoring extension, installed:

[root@server2 ~]# dnf install -y postgresql13-contrib.x86_64

Restart your PostgreSQL instance, check you can connect with pmm account and create extension with:

[postgres@server2 data]$ psql postgres pmm -c "\conninfo"
Password for user pmm:
You are connected to database "postgres" as user "pmm" via socket in "/var/run/postgresql" at port "5432".
[postgres@server1 data]$ psql
psql (13.3)
Type "help" for help.

postgres=# CREATE EXTENSION pg_stat_statements SCHEMA public;
CREATE EXTENSION

To add my client PostgreSQL instance with -postgresql as service name I have used:

[root@server2 ~]# pmm-admin add postgresql --username=pmm --password=pmm --server-url=https://admin:admin@192.168.56.101:443 --server-insecure-tls
PostgreSQL Service added.
Service ID  : /service_id/2516642c-3237-4ef3-810f-6c2ecb6ddd6c
Service name: server2.domain.com-postgresql
[root@server2 tmp]# pmm-admin inventory list services
Services list.

Service type           Service name         Address and Port  Service ID
PostgreSQL             server2.domain.com-postgresql 127.0.0.1:5432    /service_id/2516642c-3237-4ef3-810f-6c2ecb6ddd6c
PostgreSQL             pmm-server-postgresql 127.0.0.1:5432    /service_id/f7112f05-20e9-4933-8591-441fc93662f1

For a free product the look and the displayed informations are just awesome. Of course Grafana neat default look and feel helps but Percona have added a big bunch of cool features:

pmm01
pmm01
pmm02
pmm02
pmm03
pmm03

And obviously as it is Grafana there is zero limit in customization you can make…

PGWatch

I have obviously chosen the container installation and decided to use Podman that comes by default in my Oracle Linux distribution. I expected the installation to be seamless but at then end I have lost a couple of hours fighting with non working container. I have tried pgwatch2 and pgwatch2-postgres container but non of them worked and I had plenty of error like:

  • ERROR 209 name ‘requests’ is not defined
  • influxdb.exceptions.InfluxDBClientError: database not found: pgwatch2

I have decided to give a last try with image pull of pgwatch2-nonroot with:

[root@server1 ~]# podman  pull cybertec/pgwatch2-nonroot
✔ docker.io/cybertec/pgwatch2-nonroot:latest
Trying to pull docker.io/cybertec/pgwatch2-nonroot:latest...
Getting image source signatures
Copying blob 350caab5f3b5 skipped: already exists
Copying blob 49ac0bbe6c8e skipped: already exists
Copying blob 3386e6af03b0 skipped: already exists
Copying blob 1a0f3a523f04 skipped: already exists
Copying blob d1983a67e104 skipped: already exists
Copying blob 91056c4070cb skipped: already exists
Copying blob b23f24e6b1dd skipped: already exists
Copying blob 1ed2f1c72460 skipped: already exists
Copying blob effdfc7f950c skipped: already exists
Copying blob 9a055164fb69 skipped: already exists
Copying blob be763b7af1a3 skipped: already exists
Copying blob 70fa32c9c857 done
Copying blob 174f5722e61d done
Copying blob 8be6b6bc9759 done
Copying blob 7dea3ad5b533 done
Copying blob c7f6ad956dfc done
Copying blob 00e2d15bc136 done
Copying blob fe00b1e59788 done
Copying blob 40a688173fcd done
Copying config 196f099da1 done
Writing manifest to image destination
Storing signatures
196f099da17eb6bb328ed274a7435371969ec73e8c99599a47e8686f22c6f1cc

And run it with:

[root@server1 ~]# podman run -d --restart=unless-stopped --name pw2 -p 3000:3000 -p 8080:8080 -p 127.0.0.1:5432:5432 -e PW2_TESTDB=true cybertec/pgwatch2-nonroot:latest
4bd4150e3cb8991b3f9c4b24c2cc97973f5868bbf5dcbffa203e9e7c473fb465
[root@server1 ~]# podman container list -a
CONTAINER ID  IMAGE                               COMMAND               CREATED        STATUS            PORTS                                           NAMES
4bd4150e3cb8  docker.io/cybertec/pgwatch2:latest  /pgwatch2/docker-...  5 seconds ago  Up 3 seconds ago  0.0.0.0:3000->3000/tcp, 0.0.0.0:8080->8080/tcp  pw2

With chosen option you see the backend PostgreSQL instance of Pgwatch2 (port 3000):

pgwatch201
pgwatch201

On the instance you plan to monitor create a monitoring account with:

CREATE ROLE pgwatch2 WITH LOGIN PASSWORD 'secret';
ALTER ROLE pgwatch2 CONNECTION LIMIT 3;
GRANT pg_monitor TO pgwatch2;
GRANT CONNECT ON DATABASE pgbenchdb TO pgwatch2;
GRANT USAGE ON SCHEMA public TO pgwatch2;
GRANT EXECUTE ON FUNCTION pg_stat_file(text) to pgwatch2;

postgres=# show shared_preload_libraries;
 shared_preload_libraries
--------------------------
 pg_stat_statements
(1 row)

postgres=# show track_io_timing;
 track_io_timing
-----------------
 on
(1 row)

If required modify the pg_hba.conf file to allow remote connection using pgwatch2 account…

Then add the database in admin interface (port 8080) of Pgwatch2 (pg_stat_statements extention loaded in this database):

pgwatch202
pgwatch202

After a while you should see it in the dashboard:

pgwatch203
pgwatch203
pgwatch204
pgwatch204
pgwatch205
pgwatch205
pgwatch206
pgwatch206

Again, as based on Grafana, the web UI is really really neat. Same as Percona’s product customization is limitless. The product is agentless (for good or bad) so even simplify installation. On top of this the product is free and open source: what can you ask more ?

To be continued…

List of potential interesting candidates:

References

The post PostgreSQL graphical monitoring tools comparison appeared first on IT World.

]]>
https://blog.yannickjaquier.com/postgresql/postgresql-graphical-monitoring-tools-comparison.html/feed 0
PostgreSQL backup and restore tools comparison for PITR recovery https://blog.yannickjaquier.com/postgresql/postgresql-backup-and-restore-tools-comparison-for-pitr-recovery.html https://blog.yannickjaquier.com/postgresql/postgresql-backup-and-restore-tools-comparison-for-pitr-recovery.html#comments Wed, 28 Jul 2021 09:08:12 +0000 https://blog.yannickjaquier.com/?p=5211 PostgreSQL backup and restore When you are a DBA the most important part, in my opinion, of your job is backup and restore. You can fail on any other parts of your job but not if you are not able to restore and recover a database after a disaster then you will be the only […]

The post PostgreSQL backup and restore tools comparison for PITR recovery appeared first on IT World.

]]>

Table of contents

PostgreSQL backup and restore

When you are a DBA the most important part, in my opinion, of your job is backup and restore. You can fail on any other parts of your job but not if you are not able to restore and recover a database after a disaster then you will be the only one to blame (not to say fire).

postgresql_backup_restore
postgresql_backup_restore

Obviously being able to recover a disaster scenario is your number one priority ! Those disasters could be hardware (power failure, Datacenter issues, ..) or software (accidently dropped a table, corruption, file deleted, …). I have also written restore because you have countless stories on Internet of DBAs that have not been able to restore their precious backup often by lack of testing.

Restore and recover must be done within the agreed Recovery Time Objective (RTO) and within the agreed (most probably as small as you can) Recovery Point Objective (RPO). The recovery is ideally as close as possible to the time of the problem (just before an important table drop for example) and are called Point In Time Recovery (PITR).

In this blog post I plan to test few famous PostgreSQL 13 backup tools and check if they are usable in a production environment…

My test PostgreSQL instance is release 13.3 running on RedHat 7.8 (Maipo). I will add new tool in future if I find interesting candidates…

My personal preference so far:

  1. PgBackRest
  2. Pg_basebackup
  3. Barman
  4. EDB Backup and Recovery Tool (BART)

Pg_basebackup

Pg_basebackup is the default tool coming directly with your PostgreSQL installation. We cannot expect miracle from it, but if it does a decent job why would we need more ?

I set the obvious below parameter to activate WAL and WAL archiving of my instance. As you can see I have decided to put my backup in /postgres/backup and archive WAL in /postgres/backup/archivedir:

archive_mode = on
archive_command = 'test ! -f /postgres/backup/archivedir/%f && cp %p /postgres/backup/archivedir/%f'

I create my own database and my traditional test table. Idea is same as usual, insert one row, perform a backup, insert a second row and try to recover till last transaction:

postgres=# create database yjaquierdb;
CREATE DATABASE
postgres=# \c yjaquierdb
You are now connected to database "yjaquierdb" as user "postgres".
yjaquierdb=# create table test(id int,descr varchar(50));
CREATE TABLE
yjaquierdb=# insert into test values(1,'One');
INSERT 0 1

I have only one, current, WAL file not yet archived obviously:

[postgres@server data]$ ll pg_wal
total 16384
-rw------- 1 postgres postgres 16777216 Jul 20 12:17 000000010000000000000001
drwx------ 2 postgres postgres       96 Jul 20 12:16 archive_status
  
[root@server backup]# ll /postgres/backup/archivedir/

I perform a backup:

[postgres@server data]$ mkdir -p /postgres/backup/full_20jul2021
[postgres@server data]$ pg_basebackup --pgdata=/postgres/backup/full_20jul2021 --format=t --compress=9 --progress --verbose --host=localhost --port=5433 --username=postgres
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/3000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_19427"
24989/24989 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/3000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed

[postgres@server data]$ ll /postgres/backup/full_20jul2021
total 3124
-rw------- 1 postgres postgres  135690 Jul 16 17:38 backup_manifest
-rw------- 1 postgres postgres 3044276 Jul 16 17:38 base.tar.gz
-rw------- 1 postgres postgres   17643 Jul 16 17:38 pg_wal.tar.gz

We can see that 2 WAL files have been archived and 000000010000000000000003 is current WAL file:

[postgres@server data]$ ll /postgres/backup/archivedir/
total 65537
-rw------- 1 postgres postgres 16777216 Jul 20 12:18 000000010000000000000001
-rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000002
-rw------- 1 postgres postgres      339 Jul 20 12:19 000000010000000000000002.00000028.backup
[postgres@server data]$ ll pg_wal
total 49154
-rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000002
-rw------- 1 postgres postgres      339 Jul 20 12:19 000000010000000000000002.00000028.backup
-rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000003
drwx------ 2 postgres postgres     1024 Jul 20 12:19 archive_status

I insert a second row in my test table (we can see it does not change the current WAL, so if you loose this file that is not included in full backup you would have data loss):

[postgres@server data]$ psql --dbname=yjaquierdb
psql (13.3)
Type "help" for help.

yjaquierdb=# insert into test values(2,'Two');
INSERT 0 1
yjaquierdb=# select * from test;
 id | descr
----+-------
  1 | One
  2 | Two
(2 rows)

yjaquierdb=# \q
[postgres@server data]$ ll pg_wal
total 49154
-rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000002
-rw------- 1 postgres postgres      339 Jul 20 12:19 000000010000000000000002.00000028.backup
-rw------- 1 postgres postgres 16777216 Jul 20 12:21 000000010000000000000003
drwx------ 2 postgres postgres     1024 Jul 20 12:19 archive_status

To have this file archived I perform a WAL switch, we will see later that a more elegant option is available:

[postgres@server data]$ psql
psql (13.3)
Type "help" for help.

postgres=# select pg_switch_wal();
 pg_switch_wal
---------------
 0/3000170
(1 row)

postgres=# \q
[postgres@server data]$ ll pg_wal
total 71618
-rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000002
-rw------- 1 postgres postgres      339 Jul 20 12:19 000000010000000000000002.00000028.backup
-rw------- 1 postgres postgres 16777216 Jul 20 12:22 000000010000000000000003
-rw------- 1 postgres postgres 16777216 Jul 20 12:22 000000010000000000000004
drwx------ 2 postgres postgres     1024 Jul 20 12:22 archive_status
[postgres@server data]$ ll /postgres/backup/archivedir/
total 81921
-rw------- 1 postgres postgres 16777216 Jul 20 12:18 000000010000000000000001
-rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000002
-rw------- 1 postgres postgres      339 Jul 20 12:19 000000010000000000000002.00000028.backup
-rw------- 1 postgres postgres 16777216 Jul 20 12:22 000000010000000000000003

I kill my PostgreSQL instance and clean my PGDATA directory:

[postgres@server data]$ kill -9 `ps -ef | grep /usr/pgsql-13/bin/postgres | grep -v grep | awk '{print $2}'`
[postgres@server data]$ rm -rf *

Pg_basebackup restore is simply two gtar commands:

[postgres@server data]$ ll /postgres/backup/full_20jul2021/
total 4148
-rw------- 1 postgres postgres  178001 Jul 20 12:19 backup_manifest
-rw------- 1 postgres postgres 4050202 Jul 20 12:19 base.tar.gz
-rw------- 1 postgres postgres   17662 Jul 20 12:19 pg_wal.tar.gz
[postgres@server data]$ gtar xvf /postgres/backup/full_20jul2021/base.tar.gz
.
.
[postgres@server data]$ gtar xvf /postgres/backup/full_20jul2021/pg_wal.tar.gz --directory pg_wal
000000010000000000000002
archive_status/000000010000000000000002.done

I update restore_command in my postgresql.conf file:

restore_command = 'cp /postgres/backup/archivedir/%f %p'

To start recover I touch the recovery.signal file and start the instance. We can see that latest inserted row is there:

[postgres@server data]$ touch recovery.signal
[postgres@server data]$ pg_start
waiting for server to start.... done
server started
[postgres@server data]$ psql --dbname=yjaquierdb
psql (13.3)
Type "help" for help.

yjaquierdb=# select * from test;
 id | descr
----+-------
  1 | One
  2 | Two
(2 rows)

Of course if you don’t execute the pg_switch_wal() procedure the WAL 000000010000000000000003 file contains le latest insert (the second row of my test table for example) and in case of crash you would need to cross your finger to recover this latest WAL file with your latest database transactions. Same as other product, by default, PostgreSQL offer a streaming of WAL files with pg_receivewal binary.

I execute pg_receivewal in the background:

[postgres@server data]$ nohup pg_receivewal --directory=/postgres/backup/archivedir/ --verbose --port=5433 --username=postgres &
[1] 4661

[postgres@server data]$ ll -rt pg_wal
total 49154
-rw------- 1 postgres postgres 16777216 Jul 20 12:29 000000020000000000000006
-rw------- 1 postgres postgres       41 Jul 20 12:29 00000002.history
-rw------- 1 postgres postgres 16777216 Jul 20 15:21 000000020000000000000004
drwx------ 2 postgres postgres     1024 Jul 20 15:21 archive_status
-rw------- 1 postgres postgres 16777216 Jul 20 15:22 000000020000000000000005
[postgres@server data]$ ll /postgres/backup/archivedir/
total 105650
-rw------- 1 postgres postgres 16777216 Jul 20 12:18 000000010000000000000001
-rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000002
-rw------- 1 postgres postgres      339 Jul 20 12:19 000000010000000000000002.00000028.backup
-rw------- 1 postgres postgres 16777216 Jul 20 12:22 000000010000000000000003
-rw------- 1 postgres postgres 16777216 Jul 20 15:21 000000020000000000000004
-rw------- 1 postgres postgres 16777216 Jul 20 15:26 000000020000000000000005.partial
-rw------- 1 postgres postgres       41 Jul 20 12:29 00000002.history

I perform a second full instance backup:

[postgres@server data]$ pg_basebackup --pgdata=/postgres/backup/full_20jul2021_2 --format=t --compress=9 --progress --verbose --host=localhost --port=5433 --username=postgres
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/6000028 on timeline 2
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_4923"
33131/33131 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/6000138
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed
[postgres@server data]$ ll -rt pg_wal
total 49155
-rw------- 1 postgres postgres       41 Jul 20 12:29 00000002.history
-rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000005
-rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000006
drwx------ 2 postgres postgres     1024 Jul 20 15:28 archive_status
-rw------- 1 postgres postgres      339 Jul 20 15:28 000000020000000000000006.00000028.backup
-rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000007
[postgres@server data]$ ll /postgres/backup/archivedir/
total 171186
-rw------- 1 postgres postgres 16777216 Jul 20 12:18 000000010000000000000001
-rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000002
-rw------- 1 postgres postgres      339 Jul 20 12:19 000000010000000000000002.00000028.backup
-rw------- 1 postgres postgres 16777216 Jul 20 12:22 000000010000000000000003
-rw------- 1 postgres postgres 16777216 Jul 20 15:21 000000020000000000000004
-rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000005
-rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000006
-rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000007.partial
-rw------- 1 postgres postgres       41 Jul 20 12:29 00000002.history

I insert a third row in my test table:

[postgres@server data]$ psql --dbname=yjaquierdb
psql (13.3)
Type "help" for help.

yjaquierdb=# insert into test values(3,'Three');
INSERT 0 1
yjaquierdb=# select * from test;
 id | descr
----+-------
  1 | One
  2 | Two
  3 | Three
(3 rows)

I kill my PostgreSQL instance and purge PGDATA directory, I noticed that pg_receiwal command is surviving to instance shutdown: to be safe I have decided to kill it !

I restore my second full backup:

[postgres@server data]$ gtar xvf /postgres/backup/full_20jul2021_2/base.tar.gz
.
.
.
[postgres@server data]$ gtar xvf /postgres/backup/full_20jul2021_2/pg_wal.tar.gz --directory pg_wal
00000002.history
archive_status/00000002.history.done
000000020000000000000006
archive_status/000000020000000000000006.done

You have to modify restore_command paramleter in postgresql.conf or you get below error (I’m not sure why this is required to do so):

restore_command = 'cp /postgres/backup/archivedir/%f %p'
2021-07-20 15:59:27.897 CEST [9188] FATAL:  must specify restore_command when standby mode is not enabled

You have to copy latest partial WAL file and rename it to make PostgreSQL understand it:

[postgres@server data]$ ll -rt pg_wal
total 16385
-rw------- 1 postgres postgres       41 Jul 20 15:28 00000002.history
-rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000006
drwx------ 2 postgres postgres       96 Jul 20 15:37 archive_status
[postgres@server data]$ ll /postgres/backup/archivedir/
total 163842
-rw------- 1 postgres postgres 16777216 Jul 20 12:18 000000010000000000000001
-rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000002
-rw------- 1 postgres postgres      339 Jul 20 12:19 000000010000000000000002.00000028.backup
-rw------- 1 postgres postgres 16777216 Jul 20 12:22 000000010000000000000003
-rw------- 1 postgres postgres 16777216 Jul 20 15:21 000000020000000000000004
-rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000005
-rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000006
-rw------- 1 postgres postgres 16777216 Jul 20 15:30 000000020000000000000007.partial
-rw------- 1 postgres postgres       41 Jul 20 12:29 00000002.history
[postgres@server data]$ cp /postgres/backup/archivedir/000000020000000000000007.partial pg_wal/000000020000000000000007

Touch recovery.signal file and start the instance and you get latest transaction on your database:

[postgres@server data]$ touch recovery.signal
[postgres@server data]$ pg_start
waiting for server to start.... done
server started
[postgres@server data]$ psql --dbname=yjaquierdb
psql (13.3)
Type "help" for help.

yjaquierdb=# select * from test;
 id | descr
----+-------
  1 | One
  2 | Two
  3 | Three
(3 rows)

Overall pg_basebackup associated, or not, with pg_receivewal are low level tools but as an old monkey I really enjoyed them. Obviously you need a better understanding of PostgreSQL to use them but if you have it there is no surprise: they do what they sell. What’s missing currently is incremental backup…

PgBackRest

On my RedHat test server installation og PgBackRest is as simple as (postgresql-libs is also required but I had it already):

yum install pgbackrest-2.34-1.rhel7.x86_64.rpm

I set below parameter, the archive_command is coming from pgBackRest docuementation:

archive_mode = on
archive_command = 'pgbackrest --stanza=localhost archive-push %p'

I have customized the /etc/pgbackrest/pgbackrest.conf file as:

[localhost]
pg1-path=/postgres/13/data
pg1-port=5433

[global]
repo1-path=/postgres/backup
repo1-retention-full=2

[global:archive-push]
compress-level=3

I create the stanza with:

[postgres@server data]$ pgbackrest --stanza=localhost --log-level-console=info stanza-create
2021-07-20 16:32:28.091 P00   INFO: stanza-create command begin 2.34: --exec-id=13449-af55e788 --log-level-console=info --pg1-path=/postgres/13/data --pg1-port=5433 --repo1-path=/postgres/backup --stanza=localhost
2021-07-20 16:32:28.695 P00   INFO: stanza-create for stanza 'localhost' on repo1
2021-07-20 16:32:28.706 P00   INFO: stanza-create command end: completed successfully (616ms)
[postgres@server data]$ pgbackrest --stanza=localhost --log-level-console=info check
2021-07-20 16:34:08.856 P00   INFO: check command begin 2.34: --exec-id=13677-eadf8ac5 --log-level-console=info --pg1-path=/postgres/13/data --pg1-port=5433 --repo1-path=/postgres/backup --stanza=localhost
2021-07-20 16:34:09.466 P00   INFO: check repo1 configuration (primary)
2021-07-20 16:34:09.668 P00   INFO: check repo1 archive for WAL (primary)
2021-07-20 16:34:09.769 P00   INFO: WAL segment 000000030000000000000009 successfully archived to '/postgres/backup/archive/localhost/13-1/0000000300000000/000000030000000000000009-fbaa8cde71f2165ceb75365b34f3b515fe8675c6.gz' on repo1
2021-07-20 16:34:09.769 P00   INFO: check command end: completed successfully (914ms)

I have my test table:

yjaquierdb=# select * from test;
 id | descr
----+-------
  1 | One
(1 row)

Perform a backup with:

[postgres@server data]$ pgbackrest --stanza=localhost --log-level-console=info backup
2021-07-20 16:39:43.658 P00   INFO: backup command begin 2.34: --exec-id=14313-31cfa7cf --log-level-console=info --pg1-path=/postgres/13/data --pg1-port=5433 --repo1-path=/postgres/backup --repo1-retention-full=2 --stanza=localhost
WARN: no prior backup exists, incr backup has been changed to full
2021-07-20 16:39:44.364 P00   INFO: execute non-exclusive pg_start_backup(): backup begins after the next regular checkpoint completes
2021-07-20 16:39:44.864 P00   INFO: backup start archive = 00000003000000000000000B, lsn = 0/B000028
.
.
[postgres@server data]$ pgbackrest info
stanza: localhost
    status: ok
    cipher: none

    db (current)
        wal archive min/max (13): 000000030000000000000007/00000003000000000000000B

        full backup: 20210720-163944F
            timestamp start/stop: 2021-07-20 16:39:44 / 2021-07-20 16:39:47
            wal start/stop: 00000003000000000000000B / 00000003000000000000000B
            database size: 31.8MB, database backup size: 31.8MB
            repo1: backup set size: 3.9MB, backup size: 3.9MB

I insert a second row in my test table and objective will be to rstore it with no data loss:

[postgres@server data]$ psql --dbname=yjaquierdb
psql (13.3)
Type "help" for help.

yjaquierdb=# select * from test;
 id | descr
----+-------
  1 | One
  2 | Two
(2 rows)

PgBackRest is currently not capable of WAL streaming like you would do with barman or low level with pg_receivewal so the only option not to loose anything is, for example, to perform an incremental backup:

[postgres@server data]$ pgbackrest --stanza=localhost --log-level-console=info --type=incr backup
.
.
[postgres@server data]$ pgbackrest info
stanza: localhost
    status: ok
    cipher: none

    db (current)
        wal archive min/max (13): 00000003000000000000000C/000000030000000000000011

        full backup: 20210720-164748F
            timestamp start/stop: 2021-07-20 16:47:48 / 2021-07-20 16:47:51
            wal start/stop: 00000003000000000000000E / 00000003000000000000000E
            database size: 31.8MB, database backup size: 31.8MB
            repo1: backup set size: 3.9MB, backup size: 3.9MB

        incr backup: 20210720-164748F_20210720-170531I
            timestamp start/stop: 2021-07-20 17:05:31 / 2021-07-20 17:05:33
            wal start/stop: 000000030000000000000010 / 000000030000000000000010
            database size: 31.8MB, database backup size: 24.3KB
            repo1: backup set size: 3.9MB, backup size: 634B
            backup reference list: 20210720-164748F

I kill and erase PGDATA directory to simulate a disaster !

Restore your instance with something like (PITR possible with multiple option):

[postgres@server data]$ pgbackrest --stanza=localhost --log-level-console=info --target-timeline=latest restore
.
.

Simply start the instance (pgbackrest has create recovery.signal and set required option for you) and check everything is there:

[postgres@server data]$ pg_start
waiting for server to start.... done
server started
[postgres@server data]$ psql --dbname=yjaquierdb
psql (13.3)
Type "help" for help.

yjaquierdb=# select * from test;
 id | descr
----+-------
  1 | One
  2 | Two
(2 rows)

PgBackRest is closer from what I know with Oracle RMAN. You can work local or remote. Except WAL streaming that is not there you can do full, incremental or differential backups. Ease of use is there and I had no pain in implementing it: my preferred backup tool so far !

EDB Backup and Recovery Tool

EDB Backup and Recovery Tool (BART) is the commercial product from EnterpriseDB, once you have configured the EDB repository with your (trial) account and password installation is as simple as:

yum -y install edb-bart 
[postgres@server ~]$ cp /usr/edb/bart/etc/bart.cfg.sample /postgres/backup/bart.cfg

Cutomized the BART configuration to map your environment:

[BART]
bart_host= postgres@127.0.0.1
backup_path = /postgres/backup
pg_basebackup_path = /usr/bin/pg_basebackup
logfile = /postgres/backup/bart.log
scanner_logfile = /postgres/backup/bart_scanner.log
thread_count = 5

[localhost]
host = 127.0.0.1
port = 5433
user = postgres
cluster_owner = postgres
description = "PostgreSQL 13 Community"
allow_incremental_backups = enabled

I start my test PostgreSQL insatnce with:

[postgres@server log]$ pg_ctl -l logfile start
waiting for server to start.... stopped waiting
pg_ctl: could not start server

I had a startup issue:

21-07-06 12:22:23.090 CEST [18106] LOG:  invalid primary checkpoint record
2021-07-06 12:22:23.090 CEST [18106] PANIC:  could not locate a valid checkpoint record
2021-07-06 12:22:23.090 CEST [18104] LOG:  startup process (PID 18106) was terminated by signal 6: Aborted
2021-07-06 12:22:23.090 CEST [18104] LOG:  aborting startup due to startup process failure
2021-07-06 12:22:23.092 CEST [18104] LOG:  database system is shut down

Which I solved with:

[postgres@server log]$ pg_resetwal /postgres/13/data
Write-ahead log reset
[postgres@server log]$ pg_ctl -l logfile start
waiting for server to start.... done
server started

As asked you need to configure ssh passwordless access for connection with no password of the form of account@ip_address (this is because BART will configure archive_command by default to scp %p postgres@127.0.0.1:/postgres/backup/localhost/archived_wals/%f):

[postgres@server ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/postgres/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/postgres/.ssh/id_rsa.
Your public key has been saved in /home/postgres/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:vMu4U0BRmKPgFDigdQ570ehg6/7lzwFsYGY7VtQWEzo postgres@server
The key's randomart image is:
+---[RSA 2048]----+
|o.+.ooo*=o       |
|+.==.o*.o.       |
|.=.=BoEo         |
|  o=o=.o         |
| .  + +.S        |
|  .. o ...       |
| .    ..o        |
|  .  o.+ o       |
|   .. ++=        |
+----[SHA256]-----+
[postgres@server ~]$ cd .ssh
[postgres@server .ssh]$ ll
total 12
-rw------- 1 postgres postgres 1679 Jul  7 12:02 id_rsa
-rw-r----- 1 postgres postgres  400 Jul  7 12:02 id_rsa.pub
-rw-r--r-- 1 postgres postgres  716 Jul  7 12:01 known_hosts
[postgres@server .ssh]$ cat id_rsa.pub >> authorized_keys
[postgres@server .ssh]$ chmod 600 authorized_keys
[postgres@server .ssh]$ ssh postgres@127.0.0.1
Last failed login: Wed Jul  7 12:03:05 CEST 2021 from 127.0.0.1 on ssh:notty
Last login: Wed Jul  7 11:44:05 2021
[postgres@server ~]$

As I have already configured my PostgreSQL cluster for continuous WAL archiving with below parameter I might use the –no-configure option of BART initialization. If you decide to keep your own configuration you must mimic BART way of working i.e. the target directory for WAL archiving will be located inside your BART root directory (/postgres/backup for me) and configured server name (localhost for me) sub-directory:

checkpoint_timeout = 30s
max_wal_size = 1GB
min_wal_size = 80MB
archive_mode = on
archive_command = 'test ! -f /postgres/backup/localhost/archived_wals/%f && cp %p /postgres/backup/localhost/archived_wals/%f'
#restore_command = 'cp /postgres/backup/localhost/archived_wals/%f %p'

Initialiaz BART with (my BART root directory is /postgres/backup). If you want to override your configuration you have the –override option:

[postgres@server backup]$ bart init --server=localhost --no-configure

If you decide to let BART configure your PostgreSQL instanced for you check the postgresql.auto.conf file to see what has been changed. There is an interesting view called pg_file_settings to see from which file the setting is coming. To reset setting of postgresql.auto.conf file use ALETER SYSTEM SET parameter TO DEFAULT command:

postgres=# select sourcefile,count(*) from pg_file_settings where sourcefile is not null group by sourcefile;
               sourcefile               | count
----------------------------------------+-------
 /postgres/13/data/postgresql.conf      |    24
 /postgres/13/data/postgresql.auto.conf |     2
(2 rows)

Perform a backup with:

[postgres@server backup]$ bart backup --server=localhost --gzip --compress-level=9 --backup-name=full_6jul2021 --with-pg_basebackup --thread-count=4
INFO:  DebugTarget - getVar(checkDiskSpace.bytesAvailable)
INFO:  creating full backup using pg_basebackup for server 'localhost'
INFO:  creating backup for server 'localhost'
INFO:  backup identifier: '1625567515978'
INFO:  backup completed successfully
INFO:
BART VERSION: 2.6.2
BACKUP DETAILS:
BACKUP STATUS: active
BACKUP IDENTIFIER: 1625567515978
BACKUP NAME: full_6jul2021
BACKUP PARENT: none
BACKUP LOCATION: /postgres/backup/localhost/1625567515978
BACKUP SIZE: 4.19 MB
BACKUP FORMAT: tar.gz
BACKUP TIMEZONE: Europe/Paris
XLOG METHOD: fetch
BACKUP CHECKSUM(s): 0
TABLESPACE(s): 0
START WAL LOCATION: 00000003000000010000007F
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2021-07-06 12:31:55 CEST
STOP TIME: 2021-07-06 12:32:01 CEST
TOTAL DURATION: 6 sec(s)

List existing backups:

[postgres@server backup]$ bart show-backups --server=localhost
 SERVER NAME   BACKUP ID       BACKUP NAME     BACKUP PARENT   BACKUP TIME                BACKUP SIZE   WAL(s) SIZE   WAL FILES   STATUS

 localhost     1625567515978   full_6jul2021   none            2021-07-06 12:32:01 CEST   4.19 MB       0.00 bytes    0           active

Delete backups:

[postgres@server backup]$ bart delete --server=localhost --backupid=1625567515978
INFO:  deleting backup '1625567515978' of server 'localhost'
INFO:  deleting backup '1625567515978'
INFO:  WALs of deleted backup(s) will belong to prior backup(if any), or will be marked unused
WARNING: not marking any WALs as unused WALs, the WAL file '/postgres/backup/localhost/archived_wals/00000003000000010000007F' is required, yet not available in archived_wals directory
INFO:  backup(s) deleted
[postgres@server backup]$ bart show-backups --server=localhost
 SERVER NAME   BACKUP ID   BACKUP NAME   BACKUP PARENT   BACKUP TIME   BACKUP SIZE   WAL(s) SIZE   WAL FILES   STATUS

As usual I create a test table where I will insert rows after backup to see if I can do a PITR recovery and get the latest inserted rows:

yjaquierdb=# create table test(id int, descr varchar(50));
CREATE TABLE
yjaquierdb=# insert into test values(1,'One');
INSERT 0 1
yjaquierdb=# select * from test;
 id | descr
----+-------
  1 | One
(1 row)

I create a full backup with:

[postgres@server backup]$ bart backup --server=localhost --gzip --compress-level=9 --backup-name=full_7jul2021 --with-pg_basebackup --thread-count=4
.
.
.
[postgres@server backup]$ bart show-backups --server=localhost
 SERVER NAME   BACKUP ID       BACKUP NAME     BACKUP PARENT   BACKUP TIME                BACKUP SIZE   WAL(s) SIZE   WAL FILES   STATUS

 localhost     1625662500032   full_7jul2021   none            2021-07-07 14:55:05 CEST   4.23 MB       0.00 bytes    0           active

I insert a new row in my test table with:

yjaquierdb=# insert into test values(2,'Two');
INSERT 0 1
yjaquierdb=# select * from test;
id | descr
----+-------
  1 | One
  2 | Two
(2 rows)

I kill my PostgreSQL instance with:

[postgres@server ~]$ kill -9 `ps -ef | grep /usr/pgsql-13/bin/postgres | grep -v grep | awk '{print $2}'`

In a disaster scenario you might not be able to copy the latest WAL file that have not been archived and so your recovery will be incomplete and you will have data loss. My test scenario is favorable and I can copy latest unarchived WAL file with:

[postgres@server data]$ cp pg_wal/* /postgres/backup/localhost/archived_wals/
cp: omitting directory ‘pg_wal/archive_status’

Then before BART restore I crash everything with:

[postgres@server ~]$ rm -rf /postgres/13/data
[postgres@server ~]$ ll /postgres/13/data
total 0

BART restore:

[postgres@server backup]$ bart restore --server=localhost --backupid=1625662500032 --restore-path=/postgres/13/data --target-tli=latest
INFO:  restoring backup '1625662500032' of server 'localhost'
INFO:  base backup restored
INFO:  writing recovery settings to postgresql.auto.conf
INFO:  WAL file(s) will be streamed from the BART host
INFO:  archiving is disabled
INFO:  permissions set on $PGDATA
INFO:  restore completed successfully

Remark:
The –target-tli (or any other –target_xx parameters) is highly important because if you omit it then PostgreSQL will perform a recovery until the first consistent state of the database and so WILL NOT perform a PITR recovery and you desperately won’t see the latest inserted rows in your test table. –target-tli BART option is equivalent to PostgreSQL recovery_target_timeline parameter.

Everything restored and recovery.signal file automatically created for you:

[postgres@server data]$ ll /postgres/13/data/
total 62
-rw------- 1 postgres postgres   227 Jul  7 14:55 backup_label
-rw------- 1 postgres postgres   227 Jul  6 18:00 backup_label.old
drwx------ 6 postgres postgres    96 Jul  7 14:55 base
-rw------- 1 postgres postgres    30 Jul  7 12:33 current_logfiles
drwx------ 2 postgres postgres  2048 Jul  7 14:57 global
drwx------ 2 postgres postgres  1024 Jul  7 12:06 log
-rw------- 1 postgres postgres  1249 Jul  7 12:33 logfile
drwx------ 2 postgres postgres    96 Jun 17 17:21 pg_commit_ts
drwx------ 2 postgres postgres    96 Jun 17 17:21 pg_dynshmem
-rw------- 1 postgres postgres  4831 Jun 17 11:54 pg_hba.conf
-rw------- 1 postgres postgres  1636 Jun 14 16:42 pg_ident.conf
drwx------ 4 postgres postgres    96 Jul  7 14:55 pg_logical
drwx------ 4 postgres postgres    96 Jun 17 17:21 pg_multixact
drwx------ 2 postgres postgres    96 Jun 17 17:21 pg_notify
drwx------ 2 postgres postgres    96 Jun 17 17:21 pg_replslot
drwx------ 2 postgres postgres    96 Jun 17 17:21 pg_serial
drwx------ 2 postgres postgres    96 Jun 17 17:21 pg_snapshots
drwx------ 2 postgres postgres    96 Jul  7 12:33 pg_stat
drwx------ 2 postgres postgres    96 Jul  7 14:55 pg_stat_tmp
drwx------ 2 postgres postgres    96 Jul  6 18:07 pg_subtrans
drwxr-x--- 2 postgres postgres    96 Jul  7 14:57 pg_tblspc
drwx------ 2 postgres postgres    96 Jun 17 17:21 pg_twophase
-rw------- 1 postgres postgres     3 Jun 14 16:42 PG_VERSION
drwx------ 3 postgres postgres  1024 Jul  7 14:57 pg_wal
drwx------ 2 postgres postgres    96 Jun 17 17:21 pg_xact
-rw-r----- 1 postgres postgres    92 Jul  7 14:57 postgresql.auto.conf
-rw------- 1 postgres postgres 28256 Jul  7 14:57 postgresql.conf
-rw-r----- 1 postgres postgres     1 Jul  7 14:57 recovery.signal
-rw-r----- 1 postgres postgres  4648 Jun 17 16:40 server.crt
-rw------- 1 postgres postgres  1675 Jun 17 16:40 server.key
-rw-r----- 1 postgres postgres  3610 Jun 17 16:38 server.req

And the postgresql.auto.conf contains the required recovery commands:

[postgres@server data]$ cat postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
restore_command = 'cp /postgres/backup/localhost/archived_wals/%f %p'
recovery_target_timeline = latest

Once you have started your instance you can query your test table and see latest insertion is there:

[postgres@server data]$ psql --dbname=yjaquierdb
psql (13.3)
Type "help" for help.

yjaquierdb=# select * from test;
 id | descr
----+-------
  1 | One
  2 | Two
(2 rows)

yjaquierdb=#

Please not that you need to edit postgresql.conf file and restart your instance to correct this strange behavior of BART:

The BART RESTORE operation stops WAL archiving by adding an archive_mode = off parameter at the very end of the postgresql.conf file. This last parameter in the file overrides any other previous setting of the same parameter in the file. Delete the last setting and restart the database server to start WAL archiving.

# Add settings for extensions here
archive_mode = off

Barman

With the Enterprise Linux (or EPEL) YUM repository configured on your server installation is as simple as:

[root@server ~]# yum install barman-2.12-1.el7.noarch.rpm
.
.
[root@server ~]# su - barman
-bash-4.2$ id
uid=991(barman) gid=496(barman) groups=496(barman)
-bash-4.2$ barman -v
2.12

Barman by 2ndQuadrant (www.2ndQuadrant.com)

To test full feature (barman switch-wal) I have decided to create a PostgreSQL superuser on my test instance:

[postgres@server data]$ createuser --superuser --pwprompt --port=5433 barman
Enter password for new role:
Enter it again:
[postgres@server data]$ createuser -P --replication --port=5433 streaming_barman
Enter password for new role:
Enter it again:

Control everything is fine with:

postgres=# \du
                                       List of roles
 Role name        |                         Attributes                         | Member of
------------------+------------------------------------------------------------+-----------
 barman           | Superuser, Create role, Create DB                          | {}
 postgres         | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 streaming_barman | Replication                                                | {}

In /etc/barman.conf main configuration file I have just changed (make sure this directory is accessible in read/write by barman Linux account):

barman_home = /postgres/backup/barman

You need to create a configuration for your PostgreSQL server you plan to backup. I have decided to call it localhost as I have only one server for barman server and my PostgreSQL instance. Overall, same a Oralce RMAN, I am not a pure fan of this central backup server and I rather prefer to have one management per PostgreSQL server… So creating it from streaming template, check the official docuementation to determine if you prefer to do rsync/ssh or streaming:

[root@server ~]# cd /etc/barman.d/
[root@server barman.d]# ll
total 12
-rw-r--r-- 1 root root  947 Nov  4  2020 passive-server.conf-template
-rw-r--r-- 1 root root 1565 Nov  4  2020 ssh-server.conf-template
-rw-r--r-- 1 root root 1492 Nov  4  2020 streaming-server.conf-template
[root@server barman.d]# cp streaming-server.conf-template localhost.conf

[root@server barman.d]# cat localhost.conf | grep -v "^;"

[localhost]
description =  "My PostgreSQL 13 test instance"

conninfo = host=localhost user=barman dbname=postgres port=5433

streaming_conninfo = host=localhost user=streaming_barman dbname=postgres port=5433

backup_method = postgres

streaming_archiver = on
slot_name = barman
create_slot = auto

archiver = on

Remak:
archiver option is coming from a problem I had and automatic slot_name creation is to ease my life:

[barman@server ~]$ barman check localhost
Server localhost:
        empty incoming directory: FAILED ('/postgres/backup/barman/localhost/incoming' must be empty when archiver=off)

If you decide to manage slot by yourself, below caommands could be useful:

[barman@server ~]$ barman receive-wal --drop-slot localhost
Dropping physical replication slot 'barman' on server 'localhost'
Replication slot 'barman' dropped

[barman@server ~]$ barman receive-wal --create-slot localhost
Creating physical replication slot 'barman' on server 'localhost'
Replication slot 'barman' created
[barman@server ~]$ barman receive-wal --reset localhost
Resetting receive-wal directory status
Removing status file /postgres/backup/barman/localhost/streaming/00000001000000000000000C.partial
Creating status file /postgres/backup/barman/localhost/streaming/00000001000000000000000E.partial

Configure passwordless connection with barman and streaming_barman onto your test PostgreSQL instance with .pgpass file:

[barman@server ~]$ echo "localhost:5433:postgres:barman:barman" > ~/.pgpass
[barman@server ~]$ echo "localhost:5433:postgres:streaming_barman:barman" > ~/.pgpass
[barman@server ~]$ chmod 600 ~/.pgpass
[barman@server ~]$ psql --host=localhost --port=5433 --dbname=postgres --username=barman
psql (9.2.24, server 13.3)
WARNING: psql version 9.2, server version 13.0.
         Some psql features might not work.
Type "help" for help.

postgres=# \q

Check it works without prompting for password:

[barman@server ~]$ psql -U streaming_barman -h localhost --port=5433  -c "IDENTIFY_SYSTEM" replication=1
      systemid       | timeline |  xlogpos  | dbname
---------------------+----------+-----------+--------
  6982909595436191101 |        1 | 0/161E7E8 |
(1 row)

Configure authorized_keys for barman and postgres accounts… From barman you must be able to connect to your host passwordless and vice versa from postgres account…

Configure the PostgreSQL instance you plan to backup with barman with below option:

archive_mode = on
archive_command = 'barman-wal-archive localhost localhost %p'

To get barman-wal-archive command you must install Barma CLI package with:

yum install barman-cli

Remark:
The barmal-wal-archive command is a little weird, this is because I’m using the same host for barman and PostgreSQL instance. The command is: barman-wal-archive [-h] [-V] [-U USER] [-c CONFIG] [-t] BARMAN_HOST SERVER_NAME WAL_PATH. So the double localhost…

Activate streaming of WAL files, barman account must have access to pg_receivewal binary:

[barman@server .ssh]$ barman receive-wal localhost
Starting receive-wal for server localhost
ERROR: ArchiverFailure:pg_receivexlog not present in $PATH
[barman@server ~]$ export PATH=$PATH:/usr/pgsql-13/bin/
[barman@server ~]$ nohup barman receive-wal localhost &
[1] 22955
[barman@server ~]$ Starting receive-wal for server localhost
localhost: pg_receivewal: starting log streaming at 0/1000000 (timeline 1)
localhost: pg_receivewal: finished segment at 0/2000000 (timeline 1)

If your barman account is superuser on your target PostgreSQL instance you can even remotely switch WAL file with:

[barman@server ~]$ barman switch-wal --force --archive localhost
The WAL file 000000010000000000000006 has been closed on server 'localhost'
Waiting for the WAL file 000000010000000000000006 from server 'localhost' (max: 30 seconds)
localhost: pg_receivewal: finished segment at 0/7000000 (timeline 1)
Processing xlog segments from streaming for localhost
        000000010000000000000006

Perform a final check with:

[barman@server ~]$ barman check localhost
Server localhost:
        PostgreSQL: OK
        superuser or standard user with backup privileges: OK
        PostgreSQL streaming: OK
        wal_level: OK
        replication slot: OK
        directories: OK
        retention policy settings: OK
        backup maximum age: OK (no last_backup_maximum_age provided)
        compression settings: OK
        failed backups: OK (there are 0 failed backups)
        minimum redundancy requirements: OK (have 0 backups, expected at least 0)
        pg_basebackup: OK
        pg_basebackup compatible: OK
        pg_basebackup supports tablespaces mapping: OK
        systemid coherence: OK (no system Id stored on disk)
        pg_receivexlog: OK
        pg_receivexlog compatible: OK
        receive-wal running: OK
        archiver errors: OK
[barman@server ~]$ barman status localhost
Server localhost:
        Description: My PostgreSQL 13 test instance
        Active: True
        Disabled: False
        PostgreSQL version: 13.3
        Cluster state: in production
        pgespresso extension: Not available
        Current data size: 32.0 MiB
        PostgreSQL Data directory: /postgres/13/data
        Current WAL segment: 000000010000000000000006
        PostgreSQL 'archive_command' setting: barman-wal-archive localhost localhost %p
        Last archived WAL: 000000010000000000000005, at Fri Jul 16 11:05:16 2021
        Failures of WAL archiver: 0
        Server WAL archiving rate: 11.54/hour
        Passive node: False
        Retention policies: not enforced
        No. of available backups: 1
        First available backup: 20210716T110510
        Last available backup: 20210716T110510
        Minimum redundancy requirements: satisfied (1/0)

As usual I create a yjaquierdb database with a test table inside that contains only one row:

yjaquierdb=# select * from test;
 id | descr
----+-------
  1 | One
(1 row)

Time to perform our first backup:

[barman@server localhost]$ barman backup localhost --wait
Starting backup using postgres method for server localhost in /postgres/backup/barman/localhost/base/20210716T125713
Backup start at LSN: 0/4012760 (000000010000000000000004, 00012760)
Starting backup copy via pg_basebackup for 20210716T125713
Copy done (time: 4 seconds)
Finalising the backup.
This is the first backup for server localhost
WAL segments preceding the current backup have been found:
        000000010000000000000001 from server localhost has been removed
        000000010000000000000002 from server localhost has been removed
        000000010000000000000003 from server localhost has been removed
Backup size: 31.9 MiB
Backup end at LSN: 0/6000000 (000000010000000000000005, 00000000)
Backup completed (start time: 2021-07-16 12:57:13.538788, elapsed time: 4 seconds)
Waiting for the WAL file 000000010000000000000005 from server 'localhost'
Processing xlog segments from streaming for localhost
        000000010000000000000004
Processing xlog segments from file archival for localhost
        000000010000000000000004
        000000010000000000000005
        000000010000000000000005.00000028.backup

Confirm backup creation with:

[barman@server localhost]$ barman list-backup localhost
localhost 20210716T125713 - Fri Jul 16 12:57:18 2021 - Size: 47.9 MiB - WAL Size: 0 B
[barman@server localhost]$ barman show-backup localhost 20210716T125713
Backup 20210716T125713:
  Server Name            : localhost
  System Id              : 6985474264206038247
  Status                 : DONE
  PostgreSQL Version     : 130003
  PGDATA directory       : /postgres/13/data

  Base backup information:
    Disk usage           : 31.9 MiB (47.9 MiB with WALs)
    Incremental size     : 31.9 MiB (-0.00%)
    Timeline             : 1
    Begin WAL            : 000000010000000000000005
    End WAL              : 000000010000000000000005
    WAL number           : 1
    Begin time           : 2021-07-16 12:57:16+02:00
    End time             : 2021-07-16 12:57:18.404710+02:00
    Copy time            : 4 seconds
    Estimated throughput : 6.6 MiB/s
    Begin Offset         : 40
    End Offset           : 0
    Begin LSN           : 0/5000028
    End LSN             : 0/6000000

  WAL information:
    No of files          : 0
    Disk usage           : 0 B
    Last available       : 000000010000000000000005

  Catalog information:
    Retention Policy     : not enforced
    Previous Backup      : - (this is the oldest base backup)
    Next Backup          : - (this is the latest base backup)

I insert a new row in my test table and then I crash and delete entirely the PostgreSQL instance and objective will be to perform a PITR recovery and get the latest inserted row in the table.

yjaquierdb=# insert into test values(2,'Two');
INSERT 0 1
yjaquierdb=# select * from test;
 id | descr
----+-------
  1 | One
  2 | Two
(2 rows)

Latest modification go inside the streaming directory of your barman backup (barman receive-wal command):

[barman@server localhost]$ pwd
/postgres/backup/barman/localhost
[barman@server localhost]$ ll streaming
total 32768
-rw------- 1 barman barman 16777216 Jul 16 13:02 000000010000000000000006.partial
[barman@server localhost]$ ll wals/0000000100000000/
total 16385
-rw-r----- 1 barman barman 16777216 Jul 16 12:57 000000010000000000000005
-rw-r----- 1 barman barman      339 Jul 16 12:57 000000010000000000000005.00000028.backup

Confirmation in PostgreSQL instance:

[postgres@server data]$ ll -rt pg_wal/
total 49162
-rw------- 1 postgres postgres 16777216 Jul 16 12:57 000000010000000000000007
-rw------- 1 postgres postgres 16777216 Jul 16 12:57 000000010000000000000008
-rw------- 1 postgres postgres      339 Jul 16 12:57 000000010000000000000005.00000028.backup
drwx------ 2 postgres postgres     1024 Jul 16 13:02 archive_status
-rw------- 1 postgres postgres 16777216 Jul 16 13:02 000000010000000000000006

I have knocked my head on the wall for a long time before finding why the partial streamed WAL file are not included in restore command, this is a current (huge) limitation of barman:

IMPORTANT: A current limitation of Barman is that the recover command is not yet able to transparently manage .partial files. In such situations, users will need to manually copy the latest partial file from the server’s streaming_wals_directory of their Barman installation to the destination for recovery, making sure that the .partial suffix is removed. Restoring a server using the last partial file, reduces data loss, by bringing down recovery point objective to values around 0, or exactly 0 in case of synchronous replication.

Even if you move it to archive directory barman does not take it into account:

[barman@server localhost]$ barman get-wal --partial localhost 000000010000000000000006 --output-directory wals/0000000100000000/
Sending WAL '000000010000000000000006' for server 'localhost' into 'wals/0000000100000000/000000010000000000000006' file
[barman@server localhost]$ ll wals/0000000100000000/
total 16385
-rw-r----- 1 barman barman 16777216 Jul 16 12:57 000000010000000000000005
-rw-r----- 1 barman barman      339 Jul 16 12:57 000000010000000000000005.00000028.backup
-rw-r----- 1 barman barman        0 Jul 16 15:24 000000010000000000000006

I have killed the PostgreSQL instance, remove (rm -rf *) the PGDATA directory, now time to restore and recover.

Remark:
If you use the traditional “barman recover localhost 20210716T125713 /postgres/13/data” command then target directory will be owned by barman (and you might have error when barman will try to write to it). Once recovered, final directory is owned by barman account so you would have to change ownership with root account to postgres account before starting the instance. One nice trick is to use –remote-ssh-command “ssh postgres@localhost” option of barman recover and this will be postgres account that will write everything (and remember we configured passwordless access from barman).

To map PosgreSQL recovery I have first tried with:

[barman@server localhost]$ barman recover --remote-ssh-command "ssh postgres@localhost" --target-tli 'latest' localhost 20210716T110510 /postgres/13/data
usage: barman recover [-h] [--target-tli TARGET_TLI]
                      [--target-time TARGET_TIME] [--target-xid TARGET_XID]
                      [--target-lsn TARGET_LSN] [--target-name TARGET_NAME]
                      [--target-immediate] [--exclusive]
                      [--tablespace NAME:LOCATION]
                      [--remote-ssh-command SSH_COMMAND] [--bwlimit KBPS]
                      [--retry-times RETRY_TIMES] [--retry-sleep RETRY_SLEEP]
                      [--no-retry] [--jobs NJOBS] [--get-wal] [--no-get-wal]
                      [--network-compression] [--no-network-compression]
                      [--target-action TARGET_ACTION] [--standby-mode]
                      server_name backup_id destination_directory
barman recover: error: argument --target-tli: 'latest' is not a valid positive integer

So finally decided to use below command. I don’t know by heart any recovery time or LSN number and I just want to recover with smallest RPO:

[barman@server localhost]$ barman recover --remote-ssh-command "ssh postgres@localhost" --target-tli 99999999 localhost 20210716T125713 /postgres/13/data
Starting remote restore for server localhost using backup 20210716T125713
Destination directory: /postgres/13/data
Remote command: ssh postgres@localhost
Doing PITR. Recovery target timeline: 'True'
Using safe horizon time for smart rsync copy: 2021-07-16 12:57:16+02:00
Copying the base backup.
Copying required WAL segments.
Generating recovery configuration
Identify dangerous settings in destination directory.

IMPORTANT
These settings have been modified to prevent data losses

postgresql.conf line 237: archive_command = false

Recovery completed (start time: 2021-07-16 15:25:03.033741, elapsed time: 4 seconds)

Your PostgreSQL server has been successfully prepared for recovery!

Copy streamed .partial file to $PGDATA/barman_wal/ directory removing the .partial extention…

Change recovery_target_timeline from 99999999 to ‘latest’ in postgresql.auto.conf file:

[postgres@server data]$ cat postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
restore_command = 'cp barman_wal/%f %p'
recovery_end_command = 'rm -fr barman_wal'
recovery_target_timeline = latest

Just start the PosgreSQL instance and all should be there (you must bounce it again to reconfigure archive_command parameter that has been overwritten). And restart barman receive-wal command:

[postgres@server data]$ psql --dbname=yjaquierdb
psql (13.3)
Type "help" for help.

yjaquierdb=# select * from test;
 id | descr
----+-------
  1 | One
  2 | Two
(2 rows)

To be continued…

  1. pg_rman
  2. WAL-E

References

The post PostgreSQL backup and restore tools comparison for PITR recovery appeared first on IT World.

]]>
https://blog.yannickjaquier.com/postgresql/postgresql-backup-and-restore-tools-comparison-for-pitr-recovery.html/feed 1
GoldenGate for Big Data and Kafka Handlers hands-on – part 2 https://blog.yannickjaquier.com/oracle/goldengate-for-big-data-and-kafka-handlers-hands-on-part-2.html https://blog.yannickjaquier.com/oracle/goldengate-for-big-data-and-kafka-handlers-hands-on-part-2.html#respond Mon, 28 Jun 2021 07:58:16 +0000 https://blog.yannickjaquier.com/?p=5131 Preamble The GoldenGate for Big Data integration with Kafka is possible through three different Kafka Handlers also called connectors: Kafka Generic Handler (Pub/Sub) Kafka Connect Handler Kafka REST Proxy Handler Only the two first are available under the Opensource Apache Licensed version so we will review only those two. Oracle has written few articles on […]

The post GoldenGate for Big Data and Kafka Handlers hands-on – part 2 appeared first on IT World.

]]>

Table of contents

Preamble

The GoldenGate for Big Data integration with Kafka is possible through three different Kafka Handlers also called connectors:

  • Kafka Generic Handler (Pub/Sub)
  • Kafka Connect Handler
  • Kafka REST Proxy Handler

Only the two first are available under the Opensource Apache Licensed version so we will review only those two. Oracle has written few articles on the differences (see references section) but those small sentences sum-up it well:

Kafka Handler

Can send raw bytes messages in four formats: JSON, Avro, XML, delimited text

Kafka Connect Handler

Generates in-memory Kafka Connect schemas and messages. Passes the messages to Kafka Connect converter to convert to bytes to send to Kafka.
There are currently only two converters: JSON and Avro. Only Confluent currently has Avro. But using the Kafka Connect interface allows the user to integrate with the open source Kafka Connect connectors.

This picture from Oracle corporation (see references section for complete article) summarize it well:

kafka04
kafka04

I initially thought the Kafka Connect Handler was provided as a plugin by Conluent (https://www.confluent.io/product/connectors/) but it is included by default in defaut Apache Kafka:

The Kafka Connect framework is also included in the Apache versions as well as Confluent version.

So it is possible to run the OGG Kafka Connect Handler with Apache Kafka. And it is possible to run open source Kafka Connect connectors with Apache Kafka.

One thing Confluent Kafka has that Apache Kafka does not is the Avro schema registry and the Avro Converter.

Oracle test case creation

My simple test case, created in my pdb1 pluggable database, will be as follow:

SQL> create user appuser identified by secure_password;

User created.

SQL> grant connect, resource to appuser;

Grant succeeded.

SQL> alter user appuser quota unlimited on users;

User altered.

SQL> connect appuser/secure_password@pdb1
Connected.

SQL> CREATE TABLE test01(id NUMBER, descr VARCHAR2(50), CONSTRAINT TEST01_PK PRIMARY KEY (id) ENABLE);

Table created.

SQL> desc test01
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                        NOT NULL NUMBER
 DESCR                                              VARCHAR2(50)

GoldenGate extract configuration

In this chapter I create an extract (capture) process to extract figures from my appuser.test01 test table:

GGSCI (server01) 1> add credentialstore

Credential store created.
 
GGSCI (server01) 2> alter credentialstore add user c##ggadmin@orcl alias c##ggadmin
Password:

Credential store altered.

GGSCI (server01) 3> info credentialstore

Reading from credential store:

Default domain: OracleGoldenGate

  Alias: ggadmin
  Userid: c##ggadmin@orcl

Use ‘alter credentialstore delete user’ to remove an alias…

GGSCI (server01) 11> dblogin useridalias c##ggadmin
Successfully logged into database CDB$ROOT.

GGSCI (server01 as c##ggadmin@orcl/CDB$ROOT) 10> add trandata pdb1.appuser.test01

2021-01-22 10:55:21  INFO    OGG-15131  Logging of supplemental redo log data is already enabled for table PDB1.APPUSER.TEST01.

2021-01-22 10:55:21  INFO    OGG-15135  TRANDATA for instantiation CSN has been added on table PDB1.APPUSER.TEST01.

2021-01-22 10:55:21  INFO    OGG-10471  ***** Oracle Goldengate support information on table APPUSER.TEST01 *****
Oracle Goldengate support native capture on table APPUSER.TEST01.
Oracle Goldengate marked following column as key columns on table APPUSER.TEST01: ID.

Configure the extract process (ERROR: Invalid group name (must be at most 8 characters).):

GGSCI (server01) 10> dblogin useridalias c##ggadmin
Successfully logged into database CDB$ROOT.

GGSCI (server01 as c##ggadmin@orcl/CDB$ROOT) 11> edit params ext01



GGSCI (server01 as c##ggadmin@orcl/CDB$ROOT) 12> view params ext01

extract ext01
useridalias c##ggadmin
ddl include mapped
exttrail ./dirdat/ex
sourcecatalog pdb1
table appuser.test01

Add, register extract and add the EXTTRAIL (name must be 2 characters or less !):

GGSCI (server01 as c##ggadmin@orcl/CDB$ROOT) 19> add extract ext01, integrated tranlog, begin now
EXTRACT (Integrated) added.

GGSCI (server01 as c##ggadmin@orcl/CDB$ROOT) 20> register extract ext01 database container (pdb1)

2021-01-22 11:06:30  INFO    OGG-02003  Extract EXT01 successfully registered with database at SCN 7103554.


GGSCI (server01 as c##ggadmin@orcl/CDB$ROOT) 21> add exttrail ./dirdat/ex, extract ext01
EXTTRAIL added.

Finally start it with (you can also use ‘view report ext01’ to get more detailed informations):

GGSCI (server01 as c##ggadmin@orcl/CDB$ROOT) 22> start ext01

Sending START request to MANAGER ...
EXTRACT EXT01 starting


GGSCI (server01 as c##ggadmin@orcl/CDB$ROOT) 23> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     EXT01       00:00:00      19:13:08

Test it works by inserting a row in your test table:

SQL> insert into test01 values(10,'Ten');

1 row created.

SQL> commit;

Commit complete.

And checking you trail files get created in chosen directory:

[oracle@server01 oggcore_1]$ ll dirdat
total 2
-rw-r----- 1 oracle dba 1294 Jan 22 11:24 ex000000000

GoldenGate for Big Data and Kafka Handler configuration

One cool directory to look at is AdapterExamples directory located inside you GoldenGate for Big Data installation and in my case AdapterExamples/big-data/kafka* sub directories:

[oracle@server01 big-data]$ pwd
/u01/app/oracle/product/19.1.0/oggbigdata_1/AdapterExamples/big-data
[oracle@server01 big-data]$ ll -d kafka*
drwxr-x--- 2 oracle dba 96 Sep 25  2019 kafka
drwxr-x--- 2 oracle dba 96 Sep 25  2019 kafka_connect
drwxr-x--- 2 oracle dba 96 Sep 25  2019 kafka_REST_proxy
[oracle@server01 big-data]$ ll kafka
total 4
-rw-r----- 1 oracle dba  261 Sep  3  2019 custom_kafka_producer.properties
-rw-r----- 1 oracle dba 1082 Sep 25  2019 kafka.props
-rw-r----- 1 oracle dba  332 Sep  3  2019 rkafka.prm

So in kafka directory we have three files that you can copy to dirprm directory of your GoldenGate for Big Data installation. Now you must customize them to match your configuration.

In custom_kafka_producer.properties I have just changed bootstrap.servers variable to match my Kafka server:

bootstrap.servers=localhost:9092
acks=1
reconnect.backoff.ms=1000

value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
# 100KB per partition
batch.size=16384
linger.ms=0

In kafka.props I have changed gg.classpath to an extract of a Kafka installation directory (this directory must be owned or readable by oracle account). It means that if Kafka is installed on another server (normal configuration) you must copy the libs to your GoldenGate for Big Data server. The chosen example payload format is avro_op (Avro in operation more verbose format). Can be one of these: xml, delimitedtext, json, json_row, avro_row, avro_op:

gg.handlerlist = kafkahandler
gg.handler.kafkahandler.type=kafka
gg.handler.kafkahandler.KafkaProducerConfigFile=custom_kafka_producer.properties
#The following resolves the topic name using the short table name
gg.handler.kafkahandler.topicMappingTemplate=${tableName}
#The following selects the message key using the concatenated primary keys
gg.handler.kafkahandler.keyMappingTemplate=${primaryKeys}
gg.handler.kafkahandler.format=avro_op
gg.handler.kafkahandler.SchemaTopicName=mySchemaTopic
gg.handler.kafkahandler.BlockingSend =false
gg.handler.kafkahandler.includeTokens=false
gg.handler.kafkahandler.mode=op
gg.handler.kafkahandler.MetaHeaderTemplate=${alltokens}


goldengate.userexit.writers=javawriter
javawriter.stats.display=TRUE
javawriter.stats.full=TRUE

gg.log=log4j
gg.log.level=INFO

gg.report.time=30sec

#Sample gg.classpath for Apache Kafka
gg.classpath=dirprm/:/u01/kafka_2.13-2.7.0/libs/*
#Sample gg.classpath for HDP
#gg.classpath=/etc/kafka/conf:/usr/hdp/current/kafka-broker/libs/*

javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar

In rkafka.prm I have changed MAP | TARGET parameter:

REPLICAT rkafka
-- Trail file for this example is located in "AdapterExamples/trail" directory
-- Command to add REPLICAT
-- add replicat rkafka, exttrail AdapterExamples/trail/tr
TARGETDB LIBFILE libggjava.so SET property=dirprm/kafka.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
MAP pdb1.appuser.*, TARGET appuser.*;

As explained in rkafka.prm file I add replicat process (dump) and trail directory (it must be the one of your legacy GoldenGate installation):

GGSCI (server01) 1> add replicat rkafka, exttrail /u01/app/oracle/product/19.1.0/oggcore_1/dirdat/ex
REPLICAT added.


GGSCI (server01) 2> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
REPLICAT    STOPPED     RKAFKA      00:00:00      00:00:03


GGSCI (server01) 3> start rkafka

Sending START request to MANAGER ...
REPLICAT RKAFKA starting


GGSCI (server01) 4> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
REPLICAT    RUNNING     RKAFKA      00:00:00      00:12:08

As a test I insert a row in my test table:

SQL> insert into test01 values(1,'One');

1 row created.

SQL> commit;

Commit complete.

I can read the event on the topic that has my table name (the first event is the test I have done when I configured GoldenGate):

[kafka@server01 kafka_2.13-2.7.0]$ bin/kafka-topics.sh --list --zookeeper localhost:2181
TEST01
__consumer_offsets
mySchemaTopic
quickstart-events
[kafka@server01 kafka_2.13-2.7.0]$ bin/kafka-console-consumer.sh --topic TEST01 --from-beginning --bootstrap-server localhost:9092
APPUSER.TEST01I42021-01-22 11:32:20.00000042021-01-22T15:20:51.081000(00000000000000001729ID$@Ten
APPUSER.TEST01I42021-01-22 15:24:55.00000042021-01-22T15:25:00.665000(00000000000000001872ID▒?One

You can clean the installation with:

stop rkafka
delete replicat rkafka

GoldenGate for Big Data and Kafka Connect Handler configuration

Same as previous chapter I copy the demo configuration files with:

[oracle@server01 kafka_connect]$ pwd
/u01/app/oracle/product/19.1.0/oggbigdata_1/AdapterExamples/big-data/kafka_connect
[oracle@server01 kafka_connect]$ ll
total 4
-rw-r----- 1 oracle dba  592 Sep  3  2019 kafkaconnect.properties
-rw-r----- 1 oracle dba  337 Sep  3  2019 kc.prm
-rw-r----- 1 oracle dba 1733 Sep 25  2019 kc.props
[oracle@server01 kafka_connect]$ cp * ../../../dirprm/

As I’m using the same server for all component kafkaconnect.properties file is already correct. I have just added converter.type to correct a bug. We see that here the example is configured to use JSON for payload:

bootstrap.servers=localhost:9092
acks=1

#JSON Converter Settings
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=true

#Avro Converter Settings
#key.converter=io.confluent.connect.avro.AvroConverter
#value.converter=io.confluent.connect.avro.AvroConverter
#key.converter.schema.registry.url=http://localhost:8081
#value.converter.schema.registry.url=http://localhost:8081


#Adjust for performance
buffer.memory=33554432
batch.size=16384
linger.ms=0

converter.type=key
converter.type=value
converter.type=header

In kc.props file I only change gg.classpath parameter with exact same comment as Kafka Handler configuration:

gg.handlerlist=kafkaconnect

#The handler properties
gg.handler.kafkaconnect.type=kafkaconnect
gg.handler.kafkaconnect.kafkaProducerConfigFile=kafkaconnect.properties
gg.handler.kafkaconnect.mode=op
#The following selects the topic name based on the fully qualified table name
gg.handler.kafkaconnect.topicMappingTemplate=${fullyQualifiedTableName}
#The following selects the message key using the concatenated primary keys
gg.handler.kafkaconnect.keyMappingTemplate=${primaryKeys}
gg.handler.kafkahandler.MetaHeaderTemplate=${alltokens}

#The formatter properties
gg.handler.kafkaconnect.messageFormatting=row
gg.handler.kafkaconnect.insertOpKey=I
gg.handler.kafkaconnect.updateOpKey=U
gg.handler.kafkaconnect.deleteOpKey=D
gg.handler.kafkaconnect.truncateOpKey=T
gg.handler.kafkaconnect.treatAllColumnsAsStrings=false
gg.handler.kafkaconnect.iso8601Format=false
gg.handler.kafkaconnect.pkUpdateHandling=abend
gg.handler.kafkaconnect.includeTableName=true
gg.handler.kafkaconnect.includeOpType=true
gg.handler.kafkaconnect.includeOpTimestamp=true
gg.handler.kafkaconnect.includeCurrentTimestamp=true
gg.handler.kafkaconnect.includePosition=true
gg.handler.kafkaconnect.includePrimaryKeys=false
gg.handler.kafkaconnect.includeTokens=false

goldengate.userexit.writers=javawriter
javawriter.stats.display=TRUE
javawriter.stats.full=TRUE

gg.log=log4j
gg.log.level=INFO

gg.report.time=30sec

#Apache Kafka Classpath
gg.classpath=/u01/kafka_2.13-2.7.0/libs/*
#Confluent IO classpath
#gg.classpath={Confluent install dir}/share/java/kafka-serde-tools/*:{Confluent install dir}/share/java/kafka/*:{Confluent install dir}/share/java/confluent-common/*

javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar:./dirprm

In kc.prm file I only change the MAP | TARGET configuration as follow:

REPLICAT kc
-- Trail file for this example is located in "AdapterExamples/trail" directory
-- Command to add REPLICAT
-- add replicat conf, exttrail AdapterExamples/trail/tr NODBCHECKPOINT
TARGETDB LIBFILE libggjava.so SET property=dirprm/kc.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 1000
MAP pdb1.appuser.*, TARGET appuser.*;

Add the Kafka Connect Handler replicat with:

GGSCI (server01) 1> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING


GGSCI (server01) 2> add replicat kc, exttrail /u01/app/oracle/product/19.1.0/oggcore_1/dirdat/ex
REPLICAT added.


GGSCI (server01) 3> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
REPLICAT    STOPPED     KC          00:00:00      00:00:05


GGSCI (server01) 3> start kc

Sending START request to MANAGER ...
REPLICAT KC starting


GGSCI (server01) 4> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
REPLICAT    RUNNING     KC          00:00:00      00:00:04

Add a new row in the test table (and commit):

SQL> connect appuser/secure_password@pdb1
Connected.
SQL> select * from test01;

        ID DESCR
---------- --------------------------------------------------
        10 Ten
         1 One

SQL> insert into test01 values(2,'Two');

1 row created.

SQL> commit;

Commit complete.

Reading the new topic you should see new lines coming:

[kafka@server01 kafka_2.13-2.7.0]$ bin/kafka-topics.sh --list --zookeeper localhost:2181
APPUSER.TEST01
TEST01
__consumer_offsets
mySchemaTopic
quickstart-events
[kafka@server01 kafka_2.13-2.7.0]$ bin/kafka-console-consumer.sh --topic APPUSER.TEST01 --from-beginning --bootstrap-server localhost:9092
{"schema":{"type":"struct","fields":[{"type":"string","optional":true,"field":"table"},{"type":"string","optional":true,"field":"op_type"},{"type":"string","optional":true,"field":"op_ts"},{"type":"string","optional":true,"field":"current_ts"},{"type":"string","optional":true,"field":"pos"},{"type":"double","optional":true,"field":"ID"},{"type":"string","optional":true,"field":"DESCR"}],"optional":false,"name":"APPUSER.TEST01"},"payload":{"table":"APPUSER.TEST01","op_type":"I","op_ts":"2021-01-22 11:32:20.000000","current_ts":"2021-01-22 17:36:00.285000","pos":"00000000000000001729","ID":10.0,"DESCR":"Ten"}}
{"schema":{"type":"struct","fields":[{"type":"string","optional":true,"field":"table"},{"type":"string","optional":true,"field":"op_type"},{"type":"string","optional":true,"field":"op_ts"},{"type":"string","optional":true,"field":"current_ts"},{"type":"string","optional":true,"field":"pos"},{"type":"double","optional":true,"field":"ID"},{"type":"string","optional":true,"field":"DESCR"}],"optional":false,"name":"APPUSER.TEST01"},"payload":{"table":"APPUSER.TEST01","op_type":"I","op_ts":"2021-01-22 15:24:55.000000","current_ts":"2021-01-22 17:36:00.727000","pos":"00000000000000001872","ID":1.0,"DESCR":"One"}}
{"schema":{"type":"struct","fields":[{"type":"string","optional":true,"field":"table"},{"type":"string","optional":true,"field":"op_type"},{"type":"string","optional":true,"field":"op_ts"},{"type":"string","optional":true,"field":"current_ts"},{"type":"string","optional":true,"field":"pos"},{"type":"double","optional":true,"field":"ID"},{"type":"string","optional":true,"field":"DESCR"}],"optional":false,"name":"APPUSER.TEST01"},"payload":{"table":"APPUSER.TEST01","op_type":"I","op_ts":"2021-01-22 17:38:23.000000","current_ts":"2021-01-22 17:38:27.800000","pos":"00000000000000002013","ID":2.0,"DESCR":"Two"}}

References

The post GoldenGate for Big Data and Kafka Handlers hands-on – part 2 appeared first on IT World.

]]>
https://blog.yannickjaquier.com/oracle/goldengate-for-big-data-and-kafka-handlers-hands-on-part-2.html/feed 0
GoldenGate for Big Data and Kafka Handlers hands-on – part 1 https://blog.yannickjaquier.com/oracle/goldengate-for-big-data-and-kafka-handlers-hands-on-part-1.html https://blog.yannickjaquier.com/oracle/goldengate-for-big-data-and-kafka-handlers-hands-on-part-1.html#respond Thu, 27 May 2021 08:41:40 +0000 https://blog.yannickjaquier.com/?p=5129 Preamble With the rise of our Cloud migration and hybrid way of working and our SAP Hana migration one of my colleague ask me about on how to transfert on-premise Oracle database information to the Cloud. His high level idea is to make them hitting a Kafka installation we are trying to implement to duplicate […]

The post GoldenGate for Big Data and Kafka Handlers hands-on – part 1 appeared first on IT World.

]]>

Table of contents

Preamble

With the rise of our Cloud migration and hybrid way of working and our SAP Hana migration one of my colleague ask me about on how to transfert on-premise Oracle database information to the Cloud. His high level idea is to make them hitting a Kafka installation we are trying to implement to duplicate events from on-premise to the cloud and open the door to more heterogenous scenarios. The direct answer to this problem is GoldenGate Kafka Handlers !

To try to answer his questions and clear my mind I have decided to implement a simple GoldenGate implementation and a dummy Kafka implementation as well as configuring different Kafka Handlers. I initially thought only GoldenGate for Big Data was required but I have understood that GoldenGate for Big Data requires a traditional GoldenGate installation and is reading files directly from this legacy GoldenGate installation.

You cannot extract (capture) Oracle figures with GoldenGate for Big Data (OGG-01115 Function dbLogin not implemented). GoldenGate for Big Data reads trail files extracted with GoldenGate (for Oracle database).

Even if on the paper I have no issue with this I would say that from license standpoint this is a complete different story. Legacy GoldenGate for Oracle Database public license price is 17,500$ for two x86 cores and GoldenGate for Big Data public license price is 20,000$ for two x86 cores (on top of this you have 22% of maintenance each year).

This blog post will be done in two parts. First part will be binaries installation and basic components configuration. Second part will be simple test case configuration as well as trying to make it working…

Proof Of Concept components version:

  • Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 – Version 19.3.0.0.0 (RU)
  • Oracle GoldenGate 19.1.0.0.4 for Oracle on Linux x86-64, 530.5 MB (V983658-01.zip)
  • Oracle GoldenGate for Big Data 19.1.0.0.1 on Linux x86-64, 88.7 MB (V983760-01.zip)
  • Scala 2.13 – kafka_2.13-2.7.0.tgz (65 MB)
  • OpenJDK 1.8.0 (chosen 8 even if 11 was available…)

Exact OpenJDK version is (I have chosen OpenJDK just to try it following the new licensing model of Oracle JDK):

[oracle@server01 ~]$ java -version
openjdk version "1.8.0_275"
OpenJDK Runtime Environment (build 1.8.0_275-b01)
OpenJDK 64-Bit Server VM (build 25.275-b01, mixed mode)

My test server is a dual socket Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz 6 cores (12 cores total, 24 thread) physical server with 64GB of RAM. I have installed all components on this unique quite powerful server.

19c pluggable database configuration

The source database I plan to use is a pluggable database called PDB1:

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           READ WRITE NO

I have a TNS entry called pdb1 for it:

[oracle@server01 ~]$ tnsping pdb1

TNS Ping Utility for Linux: Version 19.0.0.0.0 - Production on 21-JAN-2021 11:52:02

Copyright (c) 1997, 2019, Oracle.  All rights reserved.

Used parameter files:
/u01/app/oracle/product/19.0.0/dbhome_1/network/admin/sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = server01.domain.com)(PORT = 1531)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = pdb1)))
OK (0 msec)

I can connect to my root container with an sql alias set to ‘rlwrap sqlplus / as sysdba’.

First thing to do is to change log mode of my instance:

SQL> SELECT log_mode,supplemental_log_data_min, force_logging FROM v$database;

LOG_MODE     SUPPLEME FORCE_LOGGING
------------ -------- ---------------------------------------
NOARCHIVELOG NO       NO

SQL> show parameter db_recovery_file_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest                string      /u01/app/oracle/product/19.0.0
                                                 /fast_recovery_area
db_recovery_file_dest_size           big integer 1G

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1073738400 bytes
Fixed Size                  9142944 bytes
Variable Size             528482304 bytes
Database Buffers          528482304 bytes
Redo Buffers                7630848 bytes
Database mounted.
SQL> ALTER DATABASE archivelog;

Database altered.

SQL> alter database open;

Database altered.

SQL> ALTER SYSTEM SET enable_goldengate_replication=TRUE scope=both;

System altered.

SQL> SELECT log_mode,supplemental_log_data_min, force_logging FROM v$database;

LOG_MODE     SUPPLEME FORCE_LOGGING
------------ -------- ---------------------------------------
ARCHIVELOG   NO       NO

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           MOUNTED
SQL> alter pluggable database pdb1 open;

Pluggable database altered.

SQL> alter pluggable database pdb1 save state;

Pluggable database altered.

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           READ WRITE NO

I could have added supplemental log and force logging at container level with below commands, but decided to try to do it at pluggable database level:

SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;

Database altered.

SQL> ALTER DATABASE FORCE LOGGING;

Database altered.

First try to change logging mode and supplemental log at pluggable database level was a complete failure:

SQL> alter pluggable database pdb1 enable force logging;
alter pluggable database pdb1 enable force logging
*
ERROR at line 1:
ORA-65046: operation not allowed from outside a pluggable database


SQL> alter pluggable database pdb1 add supplemental log data;
alter pluggable database pdb1 add supplemental log data
*
ERROR at line 1:
ORA-65046: operation not allowed from outside a pluggable database


SQL> alter session set container=pdb1;

Session altered.

SQL> alter pluggable database pdb1 enable force logging;
alter pluggable database pdb1 enable force logging
*
ERROR at line 1:
ORA-65045: pluggable database not in a restricted mode

SQL> alter pluggable database pdb1 add supplemental log data;
alter pluggable database pdb1 add supplemental log data
*
ERROR at line 1:
ORA-31541: Supplemental logging is not enabled in CDB$ROOT.


Ant to put a pluggable database in restricted mode you have to stop it first:

SQL> alter pluggable database pdb1 open restricted;
alter pluggable database pdb1 open restricted
*
ERROR at line 1:
ORA-65019: pluggable database PDB1 already open

Activate minimal supplemental logging at container level:

SQL> alter session set container=cdb$root;

Session altered.

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           READ WRITE YES
SQL> ALTER DATABASE add SUPPLEMENTAL LOG DATA;

Database altered.

SQL> select * from cdb_supplemental_logging;

MIN PRI UNI FOR ALL PRO SUB     CON_ID
--- --- --- --- --- --- --- ----------
YES NO  NO  NO  NO  NO  NO           1

Once minimal supplemental logging has been activated at container level then all pdbs have it immediately (but no harm to issue the command again):

SQL> alter session set container=pdb1;

Session altered.

SQL> select * from cdb_supplemental_logging;

MIN PRI UNI FOR ALL PRO SUB     CON_ID
--- --- --- --- --- --- --- ----------
YES NO  NO  NO  NO  NO  NO           3

SQL> set lines 200
SQL> col pdb_name for a10
SQL> select pdb_name, logging, force_logging from cdb_pdbs;

PDB_NAME   LOGGING   FORCE_LOGGING
---------- --------- ---------------------------------------
PDB1       LOGGING   YES
PDB$SEED   LOGGING   NO

SQL> alter session set container=pdb1;

Session altered.

SQL> select * from cdb_supplemental_logging;

MIN PRI UNI FOR ALL PRO SUB     CON_ID
--- --- --- --- --- --- --- ----------
YES NO  NO  NO  NO  NO  NO           3

SQL> alter pluggable database pdb1 add supplemental log data;

Pluggable database altered.

SQL> alter pluggable database pdb1 enable force logging;

Pluggable database altered.

SQL> select pdb_name, logging, force_logging from cdb_pdbs;

PDB_NAME   LOGGING   FORCE_LOGGING
---------- --------- ---------------------------------------
PDB1       LOGGING   YES

SQL> SELECT log_mode,supplemental_log_data_min, force_logging FROM v$database;

LOG_MODE     SUPPLEME FORCE_LOGGING
------------ -------- ---------------------------------------
ARCHIVELOG   YES      NO

Complete the configuration by switching log file and putting back pluggable database in non-restricted mode:

SQL> alter session set container=cdb$root;

Session altered.

SQL> ALTER SYSTEM SWITCH LOGFILE;

System altered.

SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           READ WRITE YES
SQL> alter pluggable database pdb1 close immediate;

Pluggable database altered.

SQL> alter pluggable database pdb1 open read write;

Pluggable database altered.

SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           READ WRITE NO

Create the global ggadmin GoldenGate administrative user on your container database as specified in documentation. What is not clear to me in the documentation is that this global user should be able to connect to all containers of your multitenant database:

SQL> CREATE USER c##ggadmin IDENTIFIED BY secure_password;

User created.

SQL> GRANT CREATE SESSION, CONNECT, RESOURCE, ALTER SYSTEM, SELECT ANY DICTIONARY, UNLIMITED TABLESPACE TO c##ggadmin CONTAINER=all;

Grant succeeded.

SQL> EXEC DBMS_GOLDENGATE_AUTH.GRANT_ADMIN_PRIVILEGE(grantee=>'c##ggadmin', privilege_type=>'CAPTURE', grant_optional_privileges=>'*', container=>'ALL');

PL/SQL procedure successfully completed.

GoldenGate 19.1 installation

Installation is pretty straightforward and I have already done it with GoldenGate 12c (https://blog.yannickjaquier.com/oracle/goldengate-12c-tutorial.html). Just locate the runInstaller file in the folder where you have unzipped the downloaded file. Choose your database version:

kafka01
kafka01

Choose the target installation directory:

kafka02
kafka02

Then installation is already over with GoldenGate manager already configured (port is 7809) and running:

kafka03
kafka03

You can immediately test it with:

[oracle@server01 ~]$ /u01/app/oracle/product/19.1.0/oggcore_1/ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 19.1.0.0.4 OGGCORE_19.1.0.0.0_PLATFORMS_191017.1054_FBO
Linux, x64, 64bit (optimized), Oracle 19c on Oct 17 2019 21:16:29
Operating system character set identified as UTF-8.

Copyright (C) 1995, 2019, Oracle and/or its affiliates. All rights reserved.



GGSCI (server01) 1> info mgr

Manager is running (IP port TCP:server01.7809, Process ID 13243).

You can add the directory to your PATH or to not mess-up with GoldenGate for Big Data create an alias to be 100% of the one you are launching (this is also the opportunity to club with rlwrap). So I created ggsci_gg alias in my profile for this traditional GoldenGate installation:

alias ggsci_gg='rlwrap /u01/app/oracle/product/19.1.0/oggcore_1/ggsci'

GoldenGate 19.1 for Big Data Installation

Install Open JDK 1.8 with:

[root@server01 ~]# yum install java-1.8.0-openjdk.x86_64

Add to your profile:

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.275.b01-0.el7_9.x86_64/jre
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$JAVA_HOME/lib/amd64/server:$LD_LIBRARY_PATH

Install GoldenGate for Big Data with a simple unzip/untar:

[oracle@server01 ~]$ mkdir -p /u01/app/oracle/product/19.1.0/oggbigdata_1
[oracle@server01 19.1.0]$ cd /u01/app/oracle/product/19.1.0/oggbigdata_1
[oracle@server01 oggbigdata_1]$ cp /u01/V983760-01.zip .
[oracle@server01 oggbigdata_1]$ unzip V983760-01.zip
Archive:  V983760-01.zip
  inflating: OGGBD-19.1.0.0-README.txt
  inflating: OGG_BigData_19.1.0.0.1_Release_Notes.pdf
  inflating: OGG_BigData_Linux_x64_19.1.0.0.1.tar
[oracle@server01 oggbigdata_1]$ tar xvf OGG_BigData_Linux_x64_19.1.0.0.1.tar
.
.
[oracle@server01 oggbigdata_1]$ rm OGG_BigData_Linux_x64_19.1.0.0.1.tar V983760-01.zip

I also added this alias in my profile:

alias ggsci_bd='rlwrap /u01/app/oracle/product/19.1.0/oggbigdata_1/ggsci'

Create GoldenGate for Big Data subdirectory and configure Manager process:

[oracle@server01 ~]$ ggsci_bd

Oracle GoldenGate for Big Data
Version 19.1.0.0.1 (Build 003)

Oracle GoldenGate Command Interpreter
Version 19.1.0.0.2 OGGCORE_OGGADP.19.1.0.0.2_PLATFORMS_190916.0039
Linux, x64, 64bit (optimized), Generic on Sep 16 2019 02:12:32
Operating system character set identified as UTF-8.

Copyright (C) 1995, 2019, Oracle and/or its affiliates. All rights reserved.


GGSCI (server01) 1> create subdirs

Creating subdirectories under current directory /u01/app/oracle/product/19.1.0/oggbigdata_1

Parameter file                 /u01/app/oracle/product/19.1.0/oggbigdata_1/dirprm: created.
Report file                    /u01/app/oracle/product/19.1.0/oggbigdata_1/dirrpt: created.
Checkpoint file                /u01/app/oracle/product/19.1.0/oggbigdata_1/dirchk: created.
Process status files           /u01/app/oracle/product/19.1.0/oggbigdata_1/dirpcs: created.
SQL script files               /u01/app/oracle/product/19.1.0/oggbigdata_1/dirsql: created.
Database definitions files     /u01/app/oracle/product/19.1.0/oggbigdata_1/dirdef: created.
Extract data files             /u01/app/oracle/product/19.1.0/oggbigdata_1/dirdat: created.
Temporary files                /u01/app/oracle/product/19.1.0/oggbigdata_1/dirtmp: created.
Credential store files         /u01/app/oracle/product/19.1.0/oggbigdata_1/dircrd: created.
Masterkey wallet files         /u01/app/oracle/product/19.1.0/oggbigdata_1/dirwlt: created.
Dump files                     /u01/app/oracle/product/19.1.0/oggbigdata_1/dirdmp: created.


GGSCI (server01) 2> edit params mgr

Insert ‘port 7801’ in manager parameter file. Then start manager with:

GGSCI (server01) 1> view params mgr

port 7801


GGSCI (server01) 3> start mgr
Manager started.


GGSCI (server01) 4> info mgr

Manager is running (IP port TCP:server01.7801, Process ID 11186).

Kafka configuration

For Kafka I’m just following the official quick start documentation. I have just created a dedicated (original) kafka account to run kafka processes. I have also use nohup command not to lock too many shells. As the size is small I have installed Kafka in the home directory of my kafka account (never ever do this in production):

[kafka@server01 ~]$ pwd
/home/kafka
[kafka@server01 ~]$ tar -xzf /tmp/kafka_2.13-2.7.0.tgz
[kafka@server01 ~]$ ll
total 0
drwxr-x--- 6 kafka users 89 Dec 16 15:03 kafka_2.13-2.7.0
[kafka@server01 ~]$ cd kafka_2.13-2.7.0/
[kafka@server01 ~]$ nohup /home/kafka/kafka_2.13-2.7.0/bin/zookeeper-server-start.sh config/zookeeper.properties > zookeeper.log &
[kafka@server01 ~]$ nohup /home/kafka/kafka_2.13-2.7.0/bin/kafka-server-start.sh config/server.properties > broker_service.log &

Then I have done topic creation and dummy events creation and, obviously, all went well…

References

The post GoldenGate for Big Data and Kafka Handlers hands-on – part 1 appeared first on IT World.

]]>
https://blog.yannickjaquier.com/oracle/goldengate-for-big-data-and-kafka-handlers-hands-on-part-1.html/feed 0
Network encryption hands-on with Java, Python and SQL*Plus https://blog.yannickjaquier.com/oracle/network-encryption-hands-on-java-python-sqlplus.html https://blog.yannickjaquier.com/oracle/network-encryption-hands-on-java-python-sqlplus.html#respond Mon, 26 Apr 2021 07:29:24 +0000 https://blog.yannickjaquier.com/?p=5167 Preamble Few years after my blog post on Oracle network encryption we have finally decided to implement it, whenever possible, on our Sarbanes-Oxley (SOX) perimeter at least. For those databases, connecting using network encryption will not be an option so it simplify the database part configuration to reject any connection that is not secure. This […]

The post Network encryption hands-on with Java, Python and SQL*Plus appeared first on IT World.

]]>

Table of contents

Preamble

Few years after my blog post on Oracle network encryption we have finally decided to implement it, whenever possible, on our Sarbanes-Oxley (SOX) perimeter at least. For those databases, connecting using network encryption will not be an option so it simplify the database part configuration to reject any connection that is not secure.

This blog post is about the amount of burden you might have moving to network encryption to encrypt communication between your databases and your clients (users and/or applications).

Testing has been done using:

  • A 19c (19.10) pluggable database (pdb1) running on a RedHat 7.8 physical server.
  • A 19c (19.3) 64 bits Windows client installed on my Windows 10 laptop.
  • OpenJDK version “1.8.0_282”. The Windows binaries have been found on RedHat web site as Microsoft is starting at Java 11.
  • Python 3.7.9 on Windows 64 bits and cx-Oracle 8.1.0.

Database server configuration for network encryption

Upfront nothing is configured and connection to your database server can be unsecure if requested. It is always a good idea to test everything from a simple Oracle client even if in real life your application will not use a client (Java or else instead):

PS C:\> tnsping //server01.domain.com:1531/pdb1

TNS Ping Utility for 64-bit Windows: Version 19.0.0.0.0 - Production on 15-APR-2021 10:38:01

Copyright (c) 1997, 2019, Oracle.  All rights reserved.

Used parameter files:
C:\app\client\product\19.0.0\client_1\network\admin\sqlnet.ora

Used EZCONNECT adapter to resolve the alias
Attempting to contact (DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=pdb1))(ADDRESS=(PROTOCOL=tcp)(HOST=10.75.43.64)(PORT=1531)))
OK (110 msec)
PS C:\> sqlplus yjaquier@//server01.domain.com:1531/pdb1

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Apr 15 10:38:12 2021
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Enter password:

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.10.0.0.0

SQL> select network_service_banner from v$session_connect_info where sid in (select distinct sid from v$mystat);

NETWORK_SERVICE_BANNER
--------------------------------------------------------------------------------
TCP/IP NT Protocol Adapter for Linux: Version 19.0.0.0.0 - Production
Encryption service for Linux: Version 19.0.0.0.0 - Production
Crypto-checksumming service for Linux: Version 19.0.0.0.0 - Production

Remark:
The query is coming from Administering Oracle Database Classic Cloud Service official Oracle documentation. Here no encryption or crypto activated, the displayed text is simply saying that everything is ready to be used if required…

It is now time to play with sqlnet.ora parameters to activate network encryption:

  • SQLNET.ENCRYPTION_SERVER
  • SQLNET.ENCRYPTION_CLIENT

Possible values of both parameters are:

  • accepted to enable the security service if required or requested by the other side.
  • rejected to disable the security service, even if required by the other side.
  • requested to enable the security service if the other side allows it.
  • required to enable the security service and disallow the connection if the other side is not enabled for the security service.

For my requirement the only acceptable value for SQLNET.ENCRYPTION_SERVER is required… On a side note, except for testing purpose, I am wondering the added value of rejected value. Why would you intentionally reject a secure connection if it is possible ??!!

So in sqlnet.ora of my database server I set:

SQLNET.ENCRYPTION_SERVER=required

Then from my SQL*Plus client if I set nothing I get:

SQL> select network_service_banner from v$session_connect_info where sid in (select distinct sid from v$mystat);

NETWORK_SERVICE_BANNER
--------------------------------------------------------------------------------
TCP/IP NT Protocol Adapter for Linux: Version 19.0.0.0.0 - Production
Encryption service for Linux: Version 19.0.0.0.0 - Production
AES256 Encryption service adapter for Linux: Version 19.0.0.0.0 - Production
Crypto-checksumming service for Linux: Version 19.0.0.0.0 - Production

This because SQLNET.ENCRYPTION_CLIENT default value is accepted. We see then encryption algorithm is AES256, this is because SQLNET.ENCRYPTION_TYPES_CLIENT and SQLNET.ENCRYPTION_TYPES_SERVER contains by default all encryption algorithms i.e.:

  • 3des112 for triple DES with a two-key (112-bit) option
  • 3des168 for triple DES with a three-key (168-bit) option
  • aes128 for AES (128-bit key size)
  • aes192 for AES (192-bit key size)
  • aes256 for AES (256-bit key size)
  • des for standard DES (56-bit key size)
  • des40 for DES (40-bit key size)
  • rc4_40 for RSA RC4 (40-bit key size)
  • rc4_56 for RSA RC4 (56-bit key size)
  • rc4_128 for RSA RC4 (128-bit key size)
  • rc4_256 for RSA RC4 (256-bit key size)

If I explicitly set SQLNET.ENCRYPTION_CLIENT=rejected I get:

ERROR:
ORA-12660: Encryption or crypto-checksumming parameters incompatible

If you want your client to connect to your database server using a chosen algorithm you can set in your database server sqlnet.ora file:

SQLNET.ENCRYPTION_TYPES_SERVER=3des168

And you get when connecting:

SQL> select network_service_banner from v$session_connect_info where sid in (select distinct sid from v$mystat);

NETWORK_SERVICE_BANNER
--------------------------------------------------------------------------------
TCP/IP NT Protocol Adapter for Linux: Version 19.0.0.0.0 - Production
Encryption service for Linux: Version 19.0.0.0.0 - Production
3DES168 Encryption service adapter for Linux: Version 19.0.0.0.0 - Production
Crypto-checksumming service for Linux: Version 19.0.0.0.0 - Production

Activating network encryption with Java

As written above your application will surely not connect to your database server using SQL*Plus. SQL*Plus is quite handy when you need to test that network encryption is working but most probably your application is using Java. So how does it works in Java ? Let try it…

Same as plenty of blog posts I have written on this web site I will be using Eclipse that is free and is a nice Java editor with syntax completion to help you.

First download the JDBC driver that suit your environment. My database and client are in 19c so I have taken 19c JDBC driver and as I’m still using OpenJDK 8 (one day I will have to upgrade myself !!) I have chosen finally to use ojdbc8.jar that is certified with JDK 8.

Choose the JDBC driver that is the exact version of your Oracle client or you will get below error message when using JDBC OCI driver:

Exception in thread "main" java.lang.Error: Incompatible version of libocijdbc[Jdbc:1910000, Jdbc-OCI:193000
	at oracle.jdbc.driver.T2CConnection$1.run(T2CConnection.java:4309)
	at java.security.AccessController.doPrivileged(Native Method)
	at oracle.jdbc.driver.T2CConnection.loadNativeLibrary(T2CConnection.java:4302)
	at oracle.jdbc.driver.T2CConnection.logon(T2CConnection.java:487)
	at oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:807)
	at oracle.jdbc.driver.T2CDriverExtension.getConnection(T2CDriverExtension.java:66)
	at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:770)
	at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:572)
	at java.sql.DriverManager.getConnection(DriverManager.java:664)
	at java.sql.DriverManager.getConnection(DriverManager.java:208)
	at network_encryption.network_encryption.main(network_encryption.java:30)

Add JDBC jar file (ojdbc8.jar) in your Eclipse project with “Add Eternal JAR”:

network_encryption01
network_encryption01

From client perspective Oracle JDBC is made of two different drivers:

  • Thin driver: The JDBC Thin driver is a pure Java, Type IV driver that can be used in applications
  • Oracle Call Interface (OCI) driver: It is used on the client-side with an Oracle client installation. It can be used only with applications.

Statement of Oracle is pretty clear:

In general, unless you need OCI-specific features, such as support for non-TCP/IP networks, use the JDBC Thin driver.

Oracle documentation is providing this clear table and I well recall to have used JDBC OCI driver when testing Transparent Application Failover (TAF):

network_encryption02
network_encryption02

Oracle JDBC Thin Driver

The small java code I have written is:

  /**
  * 
  */
 /**
  * @author Yannick Jaquier
  *
  */
 package network_encryption;
 
 import java.sql.Connection;
 import java.sql.DriverManager;
 import java.sql.ResultSet;
 import java.sql.SQLException;
 import java.util.Properties;
 import oracle.jdbc.OracleConnection;
 //import oracle.jdbc.pool.OracleDataSource;
 
 public class network_encryption {
   public static void main(String[] args) throws Exception {
     Connection connection1 = null;
     String query1 = "select network_service_banner from v$session_connect_info where sid in (select distinct sid from v$mystat)";
     String connect_string = "//server01.domain.com:1531/pdb1";
     ResultSet resultset1 = null;
     Properties props = new Properties();
     //OracleDataSource ods = new OracleDataSource();
     OracleConnection oracleconnection1 = null;
    
     try {
       props.setProperty("user","yjaquier");
       props.setProperty("password","secure_password");
       props.setProperty(OracleConnection.CONNECTION_PROPERTY_THIN_NET_ENCRYPTION_LEVEL, "ACCEPTED");
       props.setProperty(OracleConnection.CONNECTION_PROPERTY_THIN_NET_ENCRYPTION_TYPES, "3des168");
       connection1 = DriverManager.getConnection("jdbc:oracle:thin:@" + connect_string, props);
       oracleconnection1 = (OracleConnection)connection1;
     }
     catch (SQLException e) {
       System.out.println("Connection Failed! Check output console");
       e.printStackTrace();
       System.exit(1);
     }
     System.out.println("Connected to Oracle database...");
    
     if (oracleconnection1!=null) {
       try {
         resultset1 = oracleconnection1.createStatement().executeQuery(query1);
         while (resultset1.next()) {
           System.out.println("Banner: "+resultset1.getString(1));
         }
         System.out.println("Used Encryption Algorithm: "+oracleconnection1.getEncryptionAlgorithmName());
       }
       catch (SQLException e) {
         System.out.println("Query has failed...");
         e.printStackTrace();
         System.exit(1);
       }
     }
     resultset1.close();
     connection1.close(); 
   }
 }

The console output is clear:

network_encryption03
network_encryption03

If for example I set OracleConnection.CONNECTION_PROPERTY_THIN_NET_ENCRYPTION_LEVEL to REJECTED I get below expected feedback (ORA-12660):

network_encryption04
network_encryption04

Oracle JDBC OCI Driver

The Java code for JDBC OCI driver is almost the same except that you have much less available parameters (CONNECTION_PROPERTY_THIN_NET_ENCRYPTION_TYPES) and functions (getEncryptionAlgorithmName). SO the idea is to link your applicative code with an instant client (or thick client if you like) and set getEncryptionAlgorithmName system variable to be able to play with your local sqlnet.ora.

The Java code is almost the same:

  /**
  * 
  */
 /**
  * @author Yannick Jaquier
  *
  */
 package network_encryption;
 
 import java.sql.Connection;
 import java.sql.DriverManager;
 import java.sql.ResultSet;
 import java.sql.SQLException;
 import java.util.Properties;
 import oracle.jdbc.OracleConnection;
 //import oracle.jdbc.pool.OracleDataSource;
 
 public class network_encryption {
   public static void main(String[] args) throws Exception {
     Connection connection1 = null;
     String query1 = "select network_service_banner from v$session_connect_info where sid in (select distinct sid from v$mystat)";
     String connect_string = "//server01.domain.com:1531/pdb1";
     ResultSet resultset1 = null;
     Properties props = new Properties();
     //OracleDataSource ods = new OracleDataSource();
     OracleConnection oracleconnection1 = null;
    
     try {
       props.setProperty("user","yjaquier");
       props.setProperty("password","secure_password");
       System.setProperty("oracle.net.tns_admin","C:\\app\\client\\product\\19.0.0\\client_1\\network\\admin");
       connection1 = DriverManager.getConnection("jdbc:oracle:oci:@" + connect_string, props);
       oracleconnection1 = (OracleConnection)connection1;
     }
     catch (SQLException e) {
       System.out.println("Connection Failed! Check output console");
       e.printStackTrace();
       System.exit(1);
     }
     System.out.println("Connected to Oracle database...");
    
     if (oracleconnection1!=null) {
       try {
         resultset1 = oracleconnection1.createStatement().executeQuery(query1);
         while (resultset1.next()) {
           System.out.println("Banner: "+resultset1.getString(1));
         }
       }
       catch (SQLException e) {
         System.out.println("Query has failed...");
         e.printStackTrace();
         System.exit(1);
       }
     }
     resultset1.close();
     connection1.close(); 
   }
 }

If in my sqlnet.ora I set:

SQLNET.ENCRYPTION_CLIENT=accepted
SQLNET.ENCRYPTION_TYPES_CLIENT=(3des168)

I get:

network_encryption05
network_encryption05

But if I set:

SQLNET.ENCRYPTION_CLIENT=rejected
SQLNET.ENCRYPTION_TYPES_CLIENT=(3des168)

I get:

network_encryption06
network_encryption06

Activating network encryption with Python

The de-facto package to connect to an Oracle database in Python is cx_Oracle ! I am not detailing how to configure this in a Python virtual environment as Internet is full of tutoriels on this already…

The cx_Oracle Python package is relying on the local client installation so you end up using the sqlnet.ora file that we have seen with SQL*Plus client.

The small Python code (network_encrytion.py) I have written is:

import cx_Oracle
import config

connection = None
query1 = "select network_service_banner from v$session_connect_info where sid in (select distinct sid from v$mystat)"
try:
  connection = cx_Oracle.connect(
    config.username,
    config.password,
    config.dsn,
    encoding=config.encoding)

  # show the version of the Oracle Database
  print(connection.version)

  # Fetch and display rows of banner query
  with connection.cursor() as cursor:
    cursor.execute(query1)
    rows = cursor.fetchall()
    if rows:
      for row in rows:
        print(row)

except cx_Oracle.Error as error:
  print(error)
finally:
  # release the connection
  if connection:
    connection.close()

You also need to put is same directory below config.py file:

username = 'yjaquier'
password = 'secure_password'
dsn = 'server01.domain.com:1531/pdb1'
encoding = 'UTF-8'

If in my sqlnet.ora file set SQLNET.ENCRYPTION_CLIENT=rejected I (obviously) get:

PS C:\Yannick\Python> python .\network_encryption.py
ORA-12660: Encryption or crypto-checksumming parameters incompatible

If I set nothing in sqlnet.ora file I get:

PS C:\Yannick\Python> python .\network_encryption.py
19.10.0.0.0
('TCP/IP NT Protocol Adapter for Linux: Version 19.0.0.0.0 - Production',)
('Encryption service for Linux: Version 19.0.0.0.0 - Production',)
('AES256 Encryption service adapter for Linux: Version 19.0.0.0.0 - Production',)
('Crypto-checksumming service for Linux: Version 19.0.0.0.0 - Production',)

I can force the encryption algorithm with SQLNET.ENCRYPTION_TYPES_CLIENT=(3des168) and get:

PS C:\Yannick\Python> python .\network_encryption.py
19.10.0.0.0
('TCP/IP NT Protocol Adapter for Linux: Version 19.0.0.0.0 - Production',)
('Encryption service for Linux: Version 19.0.0.0.0 - Production',)
('3DES168 Encryption service adapter for Linux: Version 19.0.0.0.0 - Production',)
('Crypto-checksumming service for Linux: Version 19.0.0.0.0 - Production',)

Conclusion

All in all as the default value for SQLNET.ENCRYPTION_CLIENT is accepted if you configure your database server to only accept encrypted connection then it should be transparent from application side. At least it is for Java, Python and traditional SQL scripts…

If you really don’t want to touch your application code and choose your preferred encryption algorithm (in case default one, AES256, does not fit with you) you can even imagine limiting the available encryption algorithms from database server side with SQLNET.ENCRYPTION_TYPES_SERVER.

References

The post Network encryption hands-on with Java, Python and SQL*Plus appeared first on IT World.

]]>
https://blog.yannickjaquier.com/oracle/network-encryption-hands-on-java-python-sqlplus.html/feed 0