From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Continuing instability in insert-conflict-specconflict test |
Date: | 2020-08-24 21:21:27 |
Message-ID: | 20200824212127.jbf6zulrvmeyvcnu@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2020-08-24 13:42:35 -0700, Andres Freund wrote:
> Hi,
>
> On 2020-08-23 22:53:18 -0400, Tom Lane wrote:
> > We've seen repeated failures in the tests added by commit 43e084197:
> > ...
> > I dug into this a bit today, and found that I can reproduce the failure
> > reliably by adding a short delay in the right place, as attached.
> >
> > However, after studying the test awhile I have to admit that I do not
> > understand why all these failures look the same, because it seems to
> > me that this test is a house of cards. It *repeatedly* expects that
> > issuing a command to session X will result in session Y reporting
> > some notice before X's command terminates, even though X's command will
> > never meet the conditions for isolationtester to think it's blocked.
> > AFAICS that is nothing but wishful thinking. Why is it that only one of
> > those places has failed so far?
>
> Are there really that many places expecting that? I've not gone through
> this again exhaustively by any means, but most places seem to print
> something only before waiting for a lock.
ISTM the issue at hand isn't so much that X expects something to be
printed by Y before it terminates, but that it expects the next step to
not be executed before Y unlocks. If I understand the wrong output
correctly, what happens is that "controller_print_speculative_locks" is
executed, even though s1 hasn't yet acquired the next lock. Note how the
+s1: NOTICE: blurt_and_lock_123() called for k1 in session 1
+s1: NOTICE: acquiring advisory lock on 2
is *after* "step controller_print_speculative_locks", not just after
"step s2_upsert: <... completed>"
To be clear, there'd still be an issue about whether the NOTICE is
printed before/after the "s2_upsert: <... completed>", but it looks to
me the issue is bigger than that. It's easy enough to add another wait
in s2_upsert, but that doesn't help if the entire scheduling just
continues regardless of there not really being a waiter.
Am I missing something here?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2020-08-24 22:34:58 | LWLockAcquire and LockBuffer mode argument |
Previous Message | Andres Freund | 2020-08-24 20:42:35 | Re: Continuing instability in insert-conflict-specconflict test |