From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Bogus logic in RelationBuildPartitionDesc |
Date: | 2019-12-24 04:48:23 |
Message-ID: | CA+TgmoZSig0uAmbWafO=fgQTb+W6hbQbX+bxkkPD7m8jnOy7UQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Dec 21, 2019 at 10:28 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I just had to retrieve my jaw from the floor after reading this
> bit in RelationBuildPartitionDesc:
>
> * The system cache may be out of date; if so, we may find no pg_class
> * tuple or an old one where relpartbound is NULL. In that case, try
> * the table directly. We can't just AcceptInvalidationMessages() and
> * retry the system cache lookup because it's possible that a
> * concurrent ATTACH PARTITION operation has removed itself to the
> * ProcArray but yet added invalidation messages to the shared queue;
> * InvalidateSystemCaches() would work, but seems excessive.
>
> As far as I can see, this argument is wrong on every count, and if it
> were right, the code it is defending would still be wrong.
>
> In the first place, it's claiming, based on no evidence, that our whole
> approach to syscaches is wrong. If this code needs to deal with obsolete
> syscache entries then so do probably thousands of other places.
No, because ATTACH PARTITION uses only ShareUpdateExclusiveLock,
unlike most other DDL.
> In the second place, the argument that AcceptInvalidationMessages wouldn't
> work is BS, because the way that that *actually* works is that transaction
> commit updates clog and sends inval messages before it releases locks.
> So if you've acquired enough of a lock to be sure that the data you want
> to read is stable, then you do not need to worry about whether you've
> received any relevant inval messages. You have --- and you don't even
> need to call AcceptInvalidationMessages for yourself, because lock
> acquisition already did, see e.g. LockRelationOid.
No, because ATTACH PARTITION uses only ShareUpdateExclusiveLock, which
does not conflict with the AccessShareLock required to build a cache
entry.
> In the third place, if this imaginary risk that the syscache was out of
> date were real, the code would be completely failing to deal with it,
> because all it is testing is whether it found a null relpartbound value.
> That wouldn't handle the case where a non-null relpartbound is obsolete,
> which is what you'd expect after ATTACH PARTITION.
No, that's what you'd expect after DETACH PARTITION, but that takes
AccessExclusiveLock, so the problem doesn't occur.
> Furthermore, if all of the above can be rebutted, then what's the argument
> that reading pg_class directly will produce a better answer? The only way
> that any of this could be useful is if you're trying to read data that is
> changing under you because you didn't take an adequate lock. In that case
> there's no guarantee that what you will read from pg_class is up-to-date
> either.
The argument is that the only possible change is the concurrent
addition of a partition, and therefore the only thing that happen is
to go from NULL to non-NULL. It can't go from non-NULL to NULL, nor
from one non-NULL value to another.
> Unsurprisingly, the code coverage report shows that this code path is
> never taken. I think we could dike out partdesc.c lines 113-151
> altogether, and make the code just above there look more like every
> other syscache access in the backend.
It turns out that I tested this, and that if you do that, it's
possible to produce failures. It's very hard to do so in the context
of a regression test because they are low-probability, but they do
happen. I believe some of the testing details are in the original
thread that's probably linked from the commit message that added those
lines.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2019-12-24 04:50:54 | Re: backup manifests |
Previous Message | Amit Kapila | 2019-12-24 04:24:13 | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions |