OGG-03517 when moving to Unicode from any other character set

Preamble

After the hands on post on GoldenGate 12c my next testing is replication from an US7ASCII Oracle source database to an Unicode (AL32UTF8) Oracle destination database. The example on which I’m working is a real production environment where the source database has been created (in 7 or 8 release at that time) in US7ASCII:

SQL> SELECT VALUE FROM nls_database_parameters WHERE PARAMETER='NLS_CHARACTERSET';
 
VALUE
----------------------------------------------------------------
US7ASCII

We recently upgraded this database to 12cR1 (12.1.0.2) and we are now studying to replicate few tables with GoldenGate 12c to an Unicode 6.1 12cR1 (12.1.0.2) database:

SQL> SELECT VALUE FROM nls_database_parameters WHERE PARAMETER='NLS_CHARACTERSET';
 
VALUE
--------------------------------------------------------------------------------
AL32UTF8

I initially thought this would run with no issue as AL32UTF8 supersede any other character set and in particular the smallest i.e. US7ASCII… After I have had my replicat process failing for OGG-03517 error message I discovered it would not be so easy:

ERROR   OGG-03517  Conversion from character set us7ascii of source column DESCR to character set UTF-8 of target column DESCR failed because the source column contains a character that is not available in the target character set.

I have recreated with VirtualBox a similar environment with two virtual machines running Oracle Linux Server release 7.2 and Oracle database enterprise edition 12cR1 (12.1.0.2) and GoldenGate 12c (12.2.0.1.0).

OGG-03517 with character set conversion

On source system I start with situation I had in my previous post:

SQL> SELECT * FROM app1.test1 ORDER BY val;
 
       VAL DESCR
---------- --------------------------------------------------
         1 spring
         2 summer

If I try to insert a new line with descr column equals to summer in French (été), on source system not even a warning. Of course this is well known but when you have understood this should not be allowed you might expect at least a warning. The é character is NOT included in US7ASCII so you are not supposed to use it:

SQL> INSERT INTO app1.test1 VALUES(3,'été');
 
1 ROW created.
 
SQL> COMMIT;
 
COMMIT complete.
 
SQL> SELECT * FROM app1.test1 ORDER BY val;
 
       VAL DESCR
---------- --------------------------------------------------
         1 spring
         2 summer
         3 été

Notice the display that is good even if character is not supported…

This insert in source system is crashing immediately the replicat GoldenGate 12c process:

GGSCI (server2.domain.com) 1> info all
 
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
 
MANAGER     RUNNING
REPLICAT    ABENDED     APPLY1      00:00:00      01:16:09
 
 
GGSCI (server2.domain.com) 2> view report APPLY1
.
.
.
2016-03-24 16:23:47  ERROR   OGG-03517  Conversion from character set us7ascii of source column DESCR to character set UTF-8 of target column DESCR failed because the source column contains a character that is not available in the target character set.
.
.
.

The error message is strange to me as I’m wondering what can be in US7ASCII and not in AL32UTF8… I opened a call at Oracle support for this misleading error message and they confirmed this is going to be resolved:
Bug 20374675 : Goldengate 12.1.2.0.0 replicat abends when converting US7ASCII to AL32UTF8

In My Oracle Support (MOS) note Best Practices: Oracle GoldenGate Globalization for National Language Character Sets in v11.2.1 and beyond, UTF, multi-byte (Doc ID 1499497.1) you can download two documents, one is the same as MOS note Oracle GoldenGate – A Guide to Globalization aspects when working with OGG (Doc ID 1500608.1) that is a bit obsolete if working with GoldenGate 12c.

In first note for Oracle-to-Oracle replication they suggest to set in replicat parameter NLS_LANG to source database character i.e. in my case:

SETENV (NLS_LANG=AMERICAN_AMERICA.US7ASCII)

So added it before dblogin:

GGSCI (server2.domain.com) 20> view params apply1
 
replicat apply1
SETENV (NLS_LANG=AMERICAN_AMERICA.US7ASCII)
useridalias ggadmin
assumetargetdefs
discardfile ./dirrpt/apply1.dsc, purge, megabytes 10
ddl include mapped
ddloptions report
map app1.test1, target app1.test1;

But this not changed anything and replicat process was still failing for exact same error message.

In MOS note OGG v12 replicat abend- OGG-03517 Conversion from character set (Doc ID 1946650.1) they suggest an option that is even not documented which I would translate by:

map app1.test1, target app1.test1, COLCHARSET( PASSTHRU, descr);

I have written not documented because in official 12.2.0.1 GoldenGate documentation the option is not listed in TABLE | MAP keywords… The undocumented option is working well but you have to apply it each time a character is not supported. Means you might have multiple unexpected failures before being able to list all of them…

The above MOS note is mentioning another interesting (even if not recommended with PASSTHRU) parameter: SOURCECHARSET.

I have tried using SOURCECHARSET with OVERRIDE to force the conversion to Unicode, so now my replicat process looks like:

GGSCI (server2.domain.com) 20> view params apply1
 
replicat apply1
useridalias ggadmin
assumetargetdefs
discardfile ./dirrpt/apply1.dsc, purge, megabytes 10
SOURCECHARSET OVERRIDE AL32UTF8
ddl include mapped
ddloptions report
map app1.test1, target app1.test1;

And in target system my test table looks like:

SQL> SELECT * FROM app1.test1 ORDER BY val;
 
       VAL DESCR
---------- --------------------------------------------------
         1 spring
         2 summer
         3 ete

As we can see it’s not perfect but satisfactory…

To go a little bit further I have added street in German (Straße). The Eszett or sharp S can be obtained on keyboard where it does not appear with Alt+0223 or Alt+225.

On source system:

SQL> INSERT INTO app1.test1 VALUES(4,'Straße');
 
1 ROW created.
 
SQL> COMMIT;
 
COMMIT complete.
 
SQL> SELECT * FROM app1.test1 ORDER BY val;
 
       VAL DESCR
---------- --------------------------------------------------
         1 spring
         2 summer
         3 été
         4 Straße

Again nothing looks wrong on source system, even if the character is not supported by US7ASCII character set.

On target system

SQL> SELECT * FROM app1.test1 ORDER BY val;
 
       VAL DESCR
---------- --------------------------------------------------
         1 spring
         2 summer
         3 ete
         4 Stra?e

This is the limit of the automatic conversion, if you have garbage in it will obviously produce garbage out…

Oracle Database Migration Assistant for Unicode

To go confirm the problem (even if I know it already) I have decided to download and configure Oracle Database Migration Assistant for Unicode (DMU). Available DMU release at the time of this writing is 2.1. The program is an executable you can launch on Windows or Unix. For convenience I have unzip and executed it on my windows desktop.

I have created a connection on my source US7ASCII database with SYS account to avoid privileges issues. At first connection DMU complain for SYS.DBMS_DUMA_INTERNAL pacakge that is missing:

dmu01
dmu01

Easily solved with (as SYS):

SQL> @?/rdbms/admin/prvtdumi.plb
 
LIBRARY created.
 
 
PACKAGE created.
 
No errors.
 
PACKAGE BODY created.
 
No errors.

Once required package is created DMU ask for repository creation:

dmu02
dmu02

Choose target ideal character set, UTF8 stick to Unicode 3.0 while AL32UTF8 can go up to Unicode 6.1 in 12cR1. I choose the character set of my target GoldenGate database so AL32UTF8:

dmu03
dmu03

Target tablespace to be used, SYSAUX sounds a good idea, then DMU repository is created:

dmu04
dmu04

On the migration status tab in main window proceed to step 2 that is scanning the database, click on the scan all character data in the database to check for conversion issues hyperlink:

dmu05
dmu05

Tune as per your need the scanning process:

dmu06
dmu06

As my database is small I have decided to scan it entirely:

dmu07
dmu07

Kept default options for objects:

dmu08
dmu08

You can close Scan Progress tab and in Migration Status tab click on View the scan report hyperlink. Tables with issues appear with a warning sign. Obviously my test1 table in app1 schema is the one listed:

dmu09
dmu09

Right click on table and choose Cleansing Editor, the two rows with character issues appear clearly:

dmu10
dmu10

Up to users to modify each and every single character that has issue. On a big database with many invalid character this process can be really time consuming…

References

About Post Author

This entry was posted in Oracle and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>