Re: Possibilities for optimizing inserts across oracle_fdw foreign data wrapper

From: Mladen Gogala <gogala(dot)mladen(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Possibilities for optimizing inserts across oracle_fdw foreign data wrapper
Date: 2021-09-20 01:52:00
Message-ID: c9db3365-afe4-38c8-f1c9-64ebe3cd4a87@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On 9/19/21 06:28, Niels Jespersen wrote:
>
> Hello all
>
> We are often using the oracle_fdw to transfer data between Postgres
> (version 11+) and Oracle (version 18+). It works great.
>
> However I have a task at hand that requires inserting a few billion
> rows in an Oracle table from a Postgres query.
>
> insert into t_ora (a,b,c)
>
> select a,b,c from t_pg;
>
> This is driven from a plpgsql stored procedure, if that matters.
>
> I want to optimize the running time of this. But I am unsure of which,
> if any, possibilities there actually is.
>
> Reducing the number of network roundtrips is usually a good way to
> increase throughput. But, how do I do that?
>
> If I could make the Oracle insert direct load, that would usually also
> increase throughput. But, is that possible here. There are no
> constraints defined on the destinaton tables.
>
> Regards Niels Jespersen
>

The problem with oracle_fdw is that the SQL is parsed on the Postgres
side, not on the Oracle side. If it was parsed on the Oracle side, you
could use /*+ APPEND */ hint, which is essentially, a direct insert. You
will have to write a script in one of the scripting languages, which
would utilize the array insert, available with the instant client. Even
Oracle ODBC driver utilizes array insert, as visible from the following
article:

https://dbwhisperer.wordpress.com/2020/11/21/pyodbc-fast_executemany-and-oracle-rdbms/

Unfortunately, the Postgres side of the equation is not particularly
good when using array fetch and does not do particularly well when
trying to cut down on the number of network trips:

https://github.com/mkleehammer/pyodbc/wiki/Driver-support-for-fast_executemany

I would use a script on the Postgres side and then use superior options
provided by SQL*Net.  You will need some fancy programming to prevent
waiting on each operation. I would actually write 2 scripts, one reading
data from Postgres, converting it to CSV and then piping it into script
that inserts data into Oracle. That would make the scripts work in
parallel, at least partially. Situations like this are the reason why a
DBA needs to know how to script. So, this is where you start:

https://python.swaroopch.com/

Regards

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Yi Sun 2021-09-20 06:46:24 Re: pg_upgrade problem as locale difference in data centers
Previous Message Niels Jespersen 2021-09-19 10:28:48 Possibilities for optimizing inserts across oracle_fdw foreign data wrapper