Re: Table partition with primary key in 11.3

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, "Alex V(dot)" <in_flight(at)pclovers(dot)ru>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>, tgl <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Table partition with primary key in 11.3
Date: 2019-06-07 21:35:32
Message-ID: 20190607213532.GA26907@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2019-Jun-07, Peter Geoghegan wrote:

> On Fri, Jun 7, 2019 at 1:22 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:

> > Because you can't rely on that exclusively, and you want to reuse the
> > partition ID eventually, you still need a cleanup process that removes
> > those remaining index entries. This cleanup process is a background
> > process, so it doesn't affect latency. I think it's not a good idea to
> > add latency to clients in order to optimize a background process.
>
> Ordinarily I would agree, but we're talking about something that takes
> place at the point that we're just about to split the page, that will
> probably make the page split unnecessary when we can reclaim as few as
> one or two tuples. A page split is already a very expensive thing by
> any measure, and something that can rarely be "undone", so avoiding
> them entirely is very compelling.

Sorry, I confused your argumentation with mine. I agree that removing
entries to try and prevent a page split is worth doing.

> > This way, when a partition is dropped, we have to take the time to scan
> > all global indexes; when they've been scanned we can remove the catalog
> > entry, and at that point the partition ID becomes available to future
> > partitions.
>
> It seems worth recycling partition IDs, but it should be possible to
> delay that for a very long time if necessary. Ideally, users wouldn't
> have to bother with it when they have really huge global indexes.

I envision this happening automatically -- you drop the partition, a
persistent work item is registered, autovacuum takes care of it
whenever. The user doesn't have to do anything about it.

> You don't have to be Claude Shannon to realize that it's kind of silly
> to reserve 16 bits for the offset number component of a
> TID/ItemPointer. We need to continue to support offset numbers that go
> that high, but the implementation would optimize for the common case
> where offset numbers are less than 512 (or maybe less than 1024).

(In many actual cases offset numbers require less than 7 bits in typical
pages, even).

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter Geoghegan 2019-06-07 23:58:29 Re: Table partition with primary key in 11.3
Previous Message Peter Geoghegan 2019-06-07 21:13:59 Re: Table partition with primary key in 11.3