Re: Table partition with primary key in 11.3

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, "Alex V(dot)" <in_flight(at)pclovers(dot)ru>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>, tgl <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Table partition with primary key in 11.3
Date: 2019-06-07 19:56:18
Message-ID: CAH2-WznBvRV+mfSCFuzjrSdjVWLWC7SYAuxxMJV1oDOi8Uk90g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Jun 7, 2019 at 12:43 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> Well, "quickly" might mean within a week. If it takes that long to
> fully remove a monthly partition to make that partition ID available to
> some future month's partition, that seems acceptable. Blocking
> DROP/DETACH for one hour is certainly not acceptable.

I agree that synchronous clean-up of global indexes wouldn't make
sense there, and might not be very compelling in practice.

It occurs to me that we could add a code path to nbtree page splits,
that considered removing dropped partition tuples to avert a page
split. This would be a bit like the LP_DEAD/kill_prior_tuple thing.
Technically the space used by index tuples that point to a dropped
partitions wouldn't become reclaimable immediately, but it might not
matter with this optimization.

> If this scheme means that you can keep the partition identifiers stored
> in the index to, for instance, 10 bits (allowing for 1024 partitions to
> exist at any one time, including those in the process of being cleaned
> up) instead of having to expand to (say) 24 because that covers a couple
> of years of operation before having to recreate the index, it seems
> worthwhile.

I think that we should have no inherent limit on the number of
partitions available at once, on general principle. Limiting the
number of partitions is a design that probably has a lot of sharp
edges.

The nbtree heap TID column and partition number column should probably
be a single varwidth column (not two separate columns), that is often
no wider than 6 bytes, but can be wider when there are many partitions
and/or very large partitions. That will be challenging, but it seems
like the right place to solve the problem. I think that I could make
that happen. Maybe this same representation could be used for all
nbtree indexes, not just global nbtree indexes.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alvaro Herrera 2019-06-07 20:21:58 Re: Table partition with primary key in 11.3
Previous Message Alvaro Herrera 2019-06-07 19:43:32 Re: Table partition with primary key in 11.3