From: | Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Declarative partitioning - another take |
Date: | 2016-08-26 08:03:34 |
Message-ID: | a002d027-66cb-a221-4069-c1ebcc8dfd0e@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2016/08/18 5:23, Robert Haas wrote:
> On Wed, Aug 17, 2016 at 2:21 AM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> I am slightly tempted to eliminate the pg_partition catalog and associated
>> syscache altogether and add a column to pg_class as Robert suggested.
>> That way, all relid_is_partition() calls will be replaced by
>> rel->rd_partbound != NULL check. But one potential problem with that
>> approach is that now whenever a parent relation is opened, all the
>> partition relations must be opened to get the partbound value (to form the
>> PartitionDesc to be stored in parent relation's rd_partdesc). Whereas
>> currently, we just look up the pg_partition catalog (or the associated
>> cache) for every partition and that gets us the partbound.
>
> Well, you could just look up the pg_class row without opening the
> relation, too. There is a system cache on pg_class.oid, after all. I
Yes, I somehow didn't think of that.
> think the issue is whether it's safe to read either one of those
> things without a lock on the child relation. If altering the
> partitioning information for a relation requires holding only
> AccessExclusiveLock on that relation, and no lock on the parent, then
> you really can't read the information for any child relation without
> taking at least AccessShareLock. Otherwise, it might change under
> you, and that would be bad.
I'd imagine this won't be a problem because we take an AccessExclusiveLock
on the parent when adding/removing a partition.
> I'm inclined to think that changing the partitioning information for a
> child is going to require AccessExclusiveLock on both the child and
> the parent. That seems unfortunate from a concurrency point of view,
> but we may be stuck with it: suppose you require only
> ShareUpdateExclusiveLock on the parent. Well, then a concurrent read
> transaction might see the partition boundaries change when it does a
> relcache rebuild, which would cause it to suddenly start expecting the
> data to be in a different plan in mid-transaction, perhaps even in
> mid-scan. Maybe that's survivable with really careful coding, but it
> seems like it's probably a bad thing. For example, it would mean that
> the executor would be unable to rely on the partitioning information
> in the relcache remaining stable underneath it. Moreover, the
> relcache is always going to be scanned with the most recent possible
> MVCC snapshot, but the transaction snapshot may be older, so such a
> system creates all sorts of nasty possibilities for there to be skew
> between the snapshot being used to via the data and the snapshot being
> used to read the metadata that says where the data is.
We do take a lock on the parent because we would be changing its partition
descriptor (relcache). I changed MergeAttributes() such that an
AccessExclusiveLock instead of ShareUpdateExclusiveLock is taken if the
parent is a partitioned table.
> This may need some more thought, but if we go with that approach of
> requiring an AccessExclusiveLock on both parent and child, then it
> seems to me that maybe we should consider the partitioning information
> to be a property of the parent rather than the child. Just take all
> the partitioning information for all children and put it in one big
> node tree and store it in the pg_class or pg_partition_root entry for
> the parent as one big ol' varlena. Now you can open the parent and
> get all of the partitioning information for all of the children
> without needing any lock on any child, and that's *really* good,
> because it means that some day we might be able to do partition
> elimination before locking any of the children! That would be
> excellent.
If we need an AccessExclusiveLock on parent to add/remove a partition
(IOW, changing that child table's partitioning information), then do we
need to lock the individual partitions when reading partition's
information? I mean to ask why the simple syscache look-ups to get each
partition's bound wouldn't do.
Thanks,
Amit
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2016-08-26 08:15:45 | Re: Declarative partitioning - another take |
Previous Message | Venkata B Nagothi | 2016-08-26 07:12:41 | Re: patch proposal |