From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> |
Cc: | Erik Rijkers <er(at)xs4all(dot)nl>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org |
Subject: | Re: logical replication - still unstable after all these months |
Date: | 2017-05-29 01:33:51 |
Message-ID: | CAMkU=1zsThCJV03SvdUtYGapsm+yA_GkVBgm_e+xpb2FEcoEtQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, May 28, 2017 at 3:17 PM, Mark Kirkwood <
mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:
> On 28/05/17 19:01, Mark Kirkwood wrote:
>
>
>> So running in cloud land now...so for no errors - will update.
>>
>>
>>
>>
> The framework ran 600 tests last night, and I see 3 'NOK' results, i.e 3
> failed test runs (all scale 25 and 8 pgbench clients). Given the way the
> test decides on failure (gets tired of waiting for the table md5's to
> match) - it begs the question 'What if it had waited a bit longer'? However
> from what I can see in all cases:
>
> - the rowcounts were the same in master and replica
> - the md5 of pgbench_accounts was different
>
All four tables should be wrong if there is still a transaction it is
waiting for, as all the changes happen in a single transaction.
I also got a failure, after 87 iterations of a similar test case. It
waited for hours, as mine requires manual intervention to stop waiting. On
the subscriber, one account still had a zero balance, while the history
table on the subscriber agreed with both history and accounts on the
publisher and the account should not have been zero, so definitely a
transaction atomicity got busted.
I altered the script to also save the tellers and branches tables and
repeated the runs, but so far it hasn't failed again in over 800 iterations
using the altered script.
>
> ...so does seem possible that there is some bug being tickled here.
> Unfortunately the test framework blasts away the failed tables and
> subscription and continues on...I'm going to amend it to stop on failure so
> I can have a closer look at what happened.
>
What would you want to look at? Would saving the WAL from the master be
helpful?
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2017-05-29 01:47:40 | Re: Extra Vietnamese unaccent rules |
Previous Message | Mark Kirkwood | 2017-05-28 22:17:49 | Re: logical replication - still unstable after all these months |