Re: Progress report of CREATE INDEX for nested partitioned tables

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Ilya Gladyshev <ilya(dot)v(dot)gladyshev(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Progress report of CREATE INDEX for nested partitioned tables
Date: 2023-02-01 16:27:26
Message-ID: CAEze2WgvJOvHv0mp3LwDbF27g7e_HgiWvbWdGH8980S14VBPoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 1 Feb 2023 at 16:53, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> On Wed, Feb 01, 2023 at 04:21:35PM +0100, Matthias van de Meent wrote:
> > On Wed, 1 Feb 2023 at 15:21, Ilya Gladyshev <ilya(dot)v(dot)gladyshev(at)gmail(dot)com> wrote:
> > > > 1 февр. 2023 г., в 16:01, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> написал(а):
> > > > Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
> > > > lookup for every single element therein ... this sounds slow.
> > > >
> > > > In one of the callsites, we already have the partition descriptor
> > > > available. We could just scan partdesc->is_leaf[] and add one for each
> > > > 'true' value we see there.
> > >
> > > The problem is that partdesc contains only direct children of the table and we need all the children down the inheritance tree to count the total number of leaf partitions in the first callsite.
> > >
> > > > In the other callsite, we had the table open just a few lines before the
> > > > place you call count_leaf_partitions. Maybe we can rejigger things by
> > > > examining its state before closing it: if relkind is not partitioned we
> > > > know leaf_partitions=0, and only if partitioned we count leaf partitions.
> > > > I think that would save some work. I also wonder if it's worth writing
> > > > a bespoke function for counting leaf partitions rather than relying on
> > > > find_all_inheritors.
> > >
> > > Sure, added this condition to avoid the extra work here.
> > >
> >
> > > When creating an index on a partitioned table, this column is set to
> > > - the total number of partitions on which the index is to be created.
> > > + the total number of leaf partitions on which the index is to be created or attached.
> >
> > I think we should also add a note about the (now) non-constant nature
> > of the value, something along the lines of "This value is updated as
> > we're processing and discovering partitioned tables in the partition
> > hierarchy".
>
> But the TOTAL is constant, right ? Updating the total when being called
> recursively is the problem these patches fix.

If that's the case, then I'm not seeing the 'fix' part of the patch. I
thought this patch was fixing the provably incorrect TOTAL value where
DONE > TOTAL due to the recursive operation overwriting the DONE/TOTAL
values instead of updating them.

In HEAD we set TOTAL to whatever number partitioned table we're
currently processing has - regardless of whether we're the top level
statement.
With the patch we instead add the number of child relations to that
count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
posted by Ilya. Approximately immediately after updating that count we
recurse to the child relations, and that only returns once it is done
creating the indexes, so both TOTAL and DONE go up as we process more
partitions in the hierarchy.

An example hierarchy:
CREATE TABLE parent (a, b) partition by list (a);
CREATE TABLE a1
PARTITION OF parent FOR VALUES IN (1)
PARTITION BY LIST (b);
CREATE TABLE a1bd
PARTITION OF a1 DEFAULT;

CREATE TABLE a2
PARTITION OF parent FOR VALUES IN (2)
PARTITION BY LIST (b);
CREATE TABLE a2bd
PARTITION OF a2 DEFAULT;

INSERT INTO parent (a, b) SELECT * from generate_series(1, 2) a(a)
cross join generate_series(1, 100000),b(b);
CREATE INDEX ON parent(a,b);

This will only discover that a2bd will need to be indexed after a1bd
is done (or vice versa, depending on which order a1 and a2 are
processed in the ).

Kind regards,

Matthias van de Meent

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-02-01 16:47:23 Re: RLS makes COPY TO process child tables
Previous Message Tom Lane 2023-02-01 16:17:59 pg_dump versus hash partitioning