Re: Moving delta data faster

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: yudhi s <learnerdatabase99(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Moving delta data faster
Date: 2024-04-04 15:34:49
Message-ID: a1abe41c-94d8-4292-899f-ea2f256d76ae@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 4/3/24 22:24, yudhi s wrote:
>
> On Thu, Apr 4, 2024 at 10:16 AM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com
> <mailto:adrian(dot)klaver(at)aklaver(dot)com>> wrote:
>
> On 4/3/24 20:54, yudhi s wrote:
> > On Thu, Apr 4, 2024 at 2:41 AM Adrian Klaver
> <adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com
> <mailto:adrian(dot)klaver(at)aklaver(dot)com>>> wrote:
> >
> >    > Thank you Adrian.
> >
> > And one thing i forgot to mention this target postgresql database
> would
> > be on AWS RDS whereas the source Oracle databases is on premise.
> I think
> > we don't have the FDW extension currently in place but we can get
> that.
> > I am just not able to understand clearly  though, but do you mean
> export
> > the data from source using CSV and do truncate and import on
> target. And
> > as these data will be moved through the network won't that cause
> slowness?
> >
> > The source database here is Oracle database. Correct me if wrong, it
> > looks like foreign data wrapper is like a DB link. Or do you mean
> > writing a query on the target database (which can be UPSERT or
> MERGE)
> > but will be joining the table from the source database through the
> > DBlink/DDW? But my question was whether we should use UPSERT or
> MERGE
> > for comparing and loading the delta records to the target postgresql
> > database. Want to understand which is more performant , as I see
> in the
> > past Merge having performance issues in the past, but not very sure
> > about that.
>
> My motivation was to get some basic information about your setup and
> what you are trying to achieve.
>
> If I understand correctly you have:
>
> 1) An Oracle database with tables that you want to copy the complete
> data from to a Postgres database. For this sort of thing
> COPY(https://www.postgresql.org/docs/current/sql-copy.html
> <https://www.postgresql.org/docs/current/sql-copy.html>) on the
> Postgres end using CSV data generated from the source is probably the
> quickest bulk load method.
>
> 2) After the initial load you want to do follow up INSERT/UPDATEs based
> on a delta of the source tables relative to the initial load. This is
> still a bit of mystery to me. How are determining the delta: a) On the
> source end entirely or b) Target relative to source? Also what is the
> anticipated size of the delta per transfer?
>
> Additional information needed:
>
> 1) Network distance between source and target?
>
> 2) Network capacity?
>
> 3) Expected load on both source and target servers from other
> operations?
>
>
> Thank you. Actually I was trying to understand how to cater the delta
> load after the one time load is done . The delta change in records is
> planned to be found based on the primary keys on the tables. If it found
> the key it will update the records if it does not find the keys it will
> insert the rows.
>
> Basically the select query from the source database will fetch the data
> with a certain time interval(based on the latest update timestamp or
> create timestamp if they are available or else full dump) and put it on
> S3 and then from the S3 it will be picked and gets merged to the target
> postgres database. As upsert and merge both were looking similar , so
> was wondering what we should use here for loading the delta records?

S3 is not a database. You will need to be more specific about '... then
from the S3 it will be picked and gets merged to the target postgres
database.'

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Carl L 2024-04-04 18:03:56 Multiple COPY statements for one table vs one for ~half a billion records
Previous Message Adrian Klaver 2024-04-04 15:15:42 Re: Failure of postgres_fdw because of TimeZone setting