Re: via psycopg2 or pg2pg? Move rows from one database to other

From: Thomas Güttler <guettliml(at)thomas-guettler(dot)de>
To: Francisco Olarte <folarte(at)peoplecall(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: via psycopg2 or pg2pg? Move rows from one database to other
Date: 2017-02-27 11:47:03
Message-ID: 749c9060-79e5-7c39-f25a-400decf5aa43@thomas-guettler.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thank you for explaining the steps of your algorithm.

Just one question: How to do the actual transfer of data?

I see two solutions:

1, Read the data into a script (via psycopg2 (we love python))
and dump it into a second connection.

2, connect postgres to postgres and transfer the data without a database
adapter like psycopg2.

Regards,
Thomas

Am 23.02.2017 um 17:40 schrieb Francisco Olarte:
> Thomas:
>
> On Thu, Feb 23, 2017 at 4:16 PM, Thomas Güttler
> <guettliml(at)thomas-guettler(dot)de> wrote:
>> Am 22.02.2017 um 16:00 schrieb Adrian Klaver:
>>> only written on a successful transfer. To improve the chances of
>>> successful transfer more smaller transfer batches
>>> rather then larger transfers.
>
>> I really need a solid solution.
>> You said "... improve the chances of successful transfer ...". This makes me
>> nervous.
>
> I think what Adrian say is you improve the individual transfer time,
> if it fails you retry.
>
>> Delays are no problems, but data loss or duplication is.
>
> Remember you can never guarantee 'exactly once' without very complex
> solutions, I think you can do "at least once" or "at most once". That
> means lose or duplicate.
>
> That being said, IF you have some kind of global, unchanging ( at the
> central site ) unique key, you could try the following, using a
> holding table in each satellite and assuming you have 'on conflict do
> nothing'.
>
> 1.- Move rows from main to holding table in the satelite, in a single
> transaction. This is to let you work with an unmovable set ( as your
> process is the only one touching the holding tables ). If there is
> some data in holding it is no problem, they are from a previous
> crashed transfer.
>
> 2.- Insert every thing from the holding table in main, using on
> conflict do nothing.
>
> 3.- When everything is commited in main, truncate the satellite holding table.
>
> If satellite crashes in 1 it will roll back, you have not touched main.
>
> If you crash in 2 you will find 1 partially full in the next round,
> and main will be rolled back ( it's important to not commit until
> everything is done in 2, i.e., if you have read problems in the
> satellite do no go to 3, just crash and rollback everything ). You can
> either do a loop with the current set or append more data, your
> choice, does not matter, as you have to reinsert. The on conflict do
> nothing in 2 will take care of potential duplicates.
>
> If you crash in 3 you will transfer the lot again, but the do-nothing
> in 2 will eliminate it and 3 will eventually purge it.
>
> You can optimize on that, but basically you just repeat this until
> everything goes fine. I do these ( just with two DBs, not 100 ) and it
> works.
>
> It does a lot of duplicate work, but only on problems, it normally runs smooth.
>
> If you do not have "on conflict do nothing" ( I do not remember the
> versions ) you can use an extra step. Instead of inserting in main in
> 2 do 2.a - Copy holding to main ( truncating before hand if copy
> present ) and 2.b insert news from the copy, either by using and
> anti-join with main or by deleting ( in the same transaction ) the
> dupes before inserting.
>
> Francisco Olarte.
>
>

--
Thomas Guettler http://www.thomas-guettler.de/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message dbyzaa@163.com 2017-02-27 14:13:08 hight cpu %sy usage
Previous Message Geoff Winkless 2017-02-27 11:10:25 Re: ERROR: functions in index expression must be marked IMMUTABLE