Re: Oracle to postgres migration via ora2pg (blob data)

From: Gilles Darold <gilles(at)darold(dot)net>
To: Amol Tarte <amoltarte(at)gmail(dot)com>, Niels Jespersen <NJN(at)dst(dot)dk>
Cc: Daulat Ram <Daulat(dot)Ram(at)exponential(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Oracle to postgres migration via ora2pg (blob data)
Date: 2019-07-31 21:05:28
Message-ID: a0094d92-5bf3-45ec-ddd9-81688def0433@darold.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

Le 31/07/2019 à 18:02, Amol Tarte a écrit :
> FullConvert does this job much faster than ora2pg
>
>
> With Warm Regards,
> Amol Tarte,
> Project Lead,
> Rajdeep InfoTechno Pvt. Ltd.
> Visit us at http://it.rajdeepgroup.com
>
> On Wed 31 Jul, 2019, 5:16 PM Niels Jespersen, <NJN(at)dst(dot)dk
> <mailto:NJN(at)dst(dot)dk>> wrote:
>
> I would look at the source table in Oracle first. It looks a lot
> like audit data. Perhaps all content is not needed in Postgres. If
> it is, then the table and lobs may benefit from being reorganised
> in oracle.
>
>  
>
> Alter table CLIENT_DB_AUDIT_LOG move;
>
> Alter table CLIENT_DB_AUDIT_LOG  move lob
> (SYS_LOB0000095961C00008$$);
>
> -- Three more of these.
>
>  
>
> The syntax is from my the back of my head. You may need to look
> the details up.
>
>  
>
> Niels
>
>  
>
>  
>
> *Fra:* Daulat Ram <Daulat(dot)Ram(at)exponential(dot)com
> <mailto:Daulat(dot)Ram(at)exponential(dot)com>>
> *Sendt:* 31. juli 2019 13:32
> *Til:* pgsql-general(at)lists(dot)postgresql(dot)org
> <mailto:pgsql-general(at)lists(dot)postgresql(dot)org>;
> pgsql-performance(at)lists(dot)postgresql(dot)org
> <mailto:pgsql-performance(at)lists(dot)postgresql(dot)org>
> *Emne:* Oracle to postgres migration via ora2pg (blob data)
>
>  
>
> Hello team,
>
>  
>
> We have to migrate a schema from oracle to postgres but there is
> one table that is having following large lob segments.  This table
> is taking time to export. What parameters we have to set in
> ora2pg.conf to speed up the data export by ora2pg.
>
>  
>
> Table:    CLIENT_DB_AUDIT_LOG
>
>  
>
> LOBSEGMENT           SYS_LOB0000095961C00008$$          80.26
>
> LOBSEGMENT           SYS_LOB0000095961C00007$$          79.96
>
> LOBSEGMENT           SYS_LOB0000094338C00008$$           8.84
>
> LOBSEGMENT           SYS_LOB0000084338C00007$$           8.71
>
> LOBSEGMENT           SYS_LOB0000085961C00009$$           5.32
>
>  
>
> VM Details are:
>
>  
>
> RAM  8GB
>
> VCPUs 2 VCPU
>
> Disk 40GB
>
>  
>
> Thanks,
>

Hi,

Before using impressive commercial products you can try some additional
configuration with Ora2pg.

The only solution to improve data migration performances is to use
parallelism. The problem with BLOB is that they have to be converted
into hex to be inserted into a bytea. The internal function in Ora2Pg
that responsible of this job is _escape_lob(). The other problem is that
Oracle is very slow to send the BLOB to Ora2Pg, I don't know if
commercial products are better at this point but I have challenged
Ora2Pg with Kettle some years ago without do much differences. So what
you should do first is to set the following in your ora2pg.conf:

NO_LOB_LOCATOR    1
LONGREADLEN        100000000

This will force Oracle to send the full content of the BLOB in a single
pass otherwise it will use small chunks. The only drawback is that you
have to set LONGREADLEN to the highest BLOB size in your table to not
throw and LONGTRUNC error.

That also mean that for each ROW returned DBD::Oracle (Perl driver for
Oracle) will allocate at least LONGREADLEN in memory and we want to
extract as much rows as possible. You have understood that your 8GB of
memory will limit the quantity of rows that can be exported at the same
time.

The other point is that Oracle is slow so you have to parallelize data
export. Use -J 4 at command line to create 4 simultaneous process to
data export. Parallelization on Oracle side is only possible if you have
a numeric column that can be used to split the data using modulo 4 in
this case. This is a basic implementation but it is enough in most cases.

Converting BLOB to Bytea consume lot of cpu cycle too so it is a good
practice to parallelize this work too. Use -j 2 or -j 3 for this work.
The number of parallelisation process should be tested because there is
a limit where you will not win anything.

If you have, let's say 32GB of memory and 12 cpu you could try a command
like :

    ora2pg -c ora2pg.conf -J 4 -j 3 -t CLIENT_DB_AUDIT_LOG -L 500

If you have less resources don't forget that -J and -j must be
multiplied to have the number of process that Ora2Pg will parallelize.
The -L option (DATA_LIMIT) is used to reduce the number of row extracted
at a time. Here with a value of 500 it will process 50 (DATA_LIMIT/10)
rows with BLOB at a time. If the table do not have any BLOB it will use
500 row at a time. For most tables this parameter should be set to 10000
up to 250000. If you have lot of memory the value can be higher. If you
think it is too low you can set BLOB_LIMIT in ora2pg.conf to set it at a
higher value.

However Ora2Pg will show you the data migration speed so you can adjust
all these parameters to see if you have some performances gains. If you
want to know exactly at which speed Oracle is able to send the data add
--oracle_speed to the ora2pg command. Ora2Pg will only extract data from
Oracle, there will be no bytea transformation or data writing, just the
full Oracle speed. You can do some test with the value of the -J option
to see what is the best value. On the other side you can use
--ora2pg_speed option to see at which speed Ora2Pg is able to convert
the data, nothing will be written too. Use it to know if you have some
win with the value of the -j option. Don't forget to do some additional
test with the BLOB_LIMIT value to see if there some more improvement. If
someone can prove me that they have better performances at Oracle data
extraction side I will be pleased to look at this code.

I hope this will help.
 

Regards,

--
Gilles Darold
http://www.darold.net/

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Julie Nishimura 2019-07-31 22:21:00 adding more space to the existing server
Previous Message Peter J. Holzer 2019-07-31 20:25:58 Re: How do I create a Backup Operator account ?

Browse pgsql-performance by date

  From Date Subject
Next Message Shital A 2019-08-01 03:10:53 PSQL performance - TPS
Previous Message Amol Tarte 2019-07-31 16:02:10 Re: Oracle to postgres migration via ora2pg (blob data)