From: | Jerry Sievers <gsievers19(at)comcast(dot)net> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeremy Finzel <finzelj(at)gmail(dot)com> |
Subject: | Re: Deadlock in multiple CIC. |
Date: | 2017-12-27 17:42:02 |
Message-ID: | 87mv24ax05.fsf@jsievers.enova.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> On Tue, Dec 26, 2017 at 8:31 AM, Alvaro Herrera <
> alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> Jeff Janes wrote:
> > c3d09b3bd23f5f6 fixed it so concurrent CIC would not deadlock
> (or at least
> > not as reliably as before) by dropping its own snapshot before
> waiting for
> > all the other ones to go away.
> >
> > With commit 8aa3e47510b969354ea02a, concurrent CREATE INDEX
> CONCURRENTLY on
> > different tables in the same database started deadlocking
> against each
> > other again quite reliably.
> >
> > I think the solution is simply to drop the catalog snapshot as
> well, as in
> > the attached.
>
> Thanks for the analysis -- it sounds reasonable to me. However,
> I'm
> wondering why you used the *Conditionally() version instead of
> plain
> InvalidateCatalogSnapshot().
>
>
> My thinking was that if there was for some reason another snapshot
> hanging around, that dropping the catalog snapshot unconditionally
> would be a correctness bug, while doing it conditionally would just
> fail to avoid a theoretically avoidable deadlock. So it seemed
> safer.
>
>
> I think they must have the same effect in
> practice (the assumption being that you can't run CIC in a
> transaction
> that may have other snapshots) but the theory seems simpler when
> calling
> the other routine: just do away with the snapshot always, period.
>
>
> That is probably true. But I never even knew that catalog snapshots
> existed until yesterday, so didn't want to make make assumptions
> about what else might exist, to avoid introducing new bugs similar to
> the one that 8aa3e47510b969354ea02a fixed.
>
>
>
> This is back-patchable to 9.4, first branch which has MVCC
> catalog
> scans. It's strange that this has gone undetected for so long.
>
>
> Since the purpose of CIC is to build an index with minimal impact on
> other users, I think wanting to use it in concurrent cases might be
> rather rare. In a maintenance window, I wouldn't want to use CIC
> because it is slower and I'd rather just hold the stronger lock and
> do it fast, and as a hot-fix outside a maintenance window I usually
> wouldn't want to hog the CPU with concurrent builds when I could do
Hmmm, given that most/all large sites lately are probably running on hw
with dozens or perhaps hundreds of CPUs/threads, I can see DBAs not
being too concerned about "hogging".
> them sequentially instead. Also, since deadlocks are "expected"
> errors rather than "should never happen" errors, and since the docs
> don't promise that you can do parallel CIC without deadlocks, many
> people would probably shrug it off (which I initially did) rather
> than report it as a bug. I was looking into it as an enhancement
> rather than a bug until I discovered that it was already enhanced and
Agree such an edge case not a high priority to support for the above
reasons but good to assuming no breakage in some other regard :-)
> then undone.
>
> Cheers,
>
> Jeff
>
>
>
>
--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres(dot)consulting(at)comcast(dot)net
p: 312.241.7800
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2017-12-27 17:52:55 | Re: [HACKERS] taking stdbool.h into use |
Previous Message | Antonio Belloni | 2017-12-27 17:40:43 | Contributing some code |