From: | Erik Rijkers <er(at)xs4all(dot)nl> |
---|---|
To: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> |
Cc: | Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Logical replication existing data copy |
Date: | 2017-02-22 17:13:11 |
Message-ID: | b0dbcb2a1066d6728cbf62e391e7edf4@xs4all.nl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2017-02-22 14:48, Erik Rijkers wrote:
> On 2017-02-22 13:03, Petr Jelinek wrote:
>
>> 0001-Skip-unnecessary-snapshot-builds.patch
>> 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
>> 0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
>> 0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
>> 0002-Fix-after-trigger-execution-in-logical-replication.patch
>> 0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
>> 0001-Logical-replication-support-for-initial-data-copy-v5.patch
>
> It works well now, or at least my particular test case seems now
> solved.
Cried victory too early, I'm afraid.
The logical replication is now certainly much more stable but there are
still errors, just less often.
The rare 'hang'-error that I mentioned a few emails back I have not yet
encountered; I am beginning to trust that that is indeed solved.
But there is still sometimes incorrect replication. The symptoms are
the ones I mentioned earlier:
- incorrect number of rows in one of (mostly) pgbench_accounts or
pgbench_history.
the numers are always off by a very small number, say less than 20,
often even only 1 row.
- incorrect content in one of pgbench_accounts or pgbench_history
(detected via md5). Also mostly the two tables named above.
I see sometimes primary key violations on the replica. That should not
be possible if I have understood the intent of logical replication
correctly.
( ERROR: duplicate key value violates unique constraint
"pgbench_tellers_pkey" )
mostly *_tellers, also seen *_branches
Understandably, the errors become more frequent with higher client
counts: a 25x repeat with 1 client yielded only 1 failed run whereas a
25x repeat with 16 clients gave 16 failures.
I attach once more the current incarnation of my test-bash pgbench
runner, pgbench_derail2.sh.
Easiest to run it yourself, I guess.
I also attach the output (of pgbench_derail2.sh) of those two 25x
repeats:
d2_scale__1_client__1_25x.txt
d2_scale__1_client_16_25x.txt
I worry a bit about the correctness of that test program
(pgbench_derail2.sh). I especially wonder if it should look around
better at startup (e.g., at stuff left over from previous iterations).
If you see any incorrect/dumb things there, or a better way to monitor
(aka pre-flight checks), please let me know.
But the current state si certainly a big step forward -- I guess it's
just your bad luck that I had the afternoon off ;)
thanks,
Erik Rijkers
Attachment | Content-Type | Size |
---|---|---|
pgbench_derail2.sh | text/x-shellscript | 7.1 KB |
d2_scale__1_client__1_25x.txt | text/plain | 42.3 KB |
d2_scale__1_client_16_25x.txt | text/plain | 82.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Page | 2017-02-22 17:15:58 | Re: pg_monitor role |
Previous Message | Bernd Helmle | 2017-02-22 17:09:26 | Re: Make subquery alias optional in FROM clause |