Quick Links

Re: Add parallelism and glibc dependent only options to reindexdb

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject:	Re: Add parallelism and glibc dependent only options to reindexdb
Date:	2019-07-11 13:34:17
Message-ID:	20190711133417.GK4500@paquier.xyz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Jul 11, 2019 at 11:48:20AM +0200, Julien Rouhaud wrote:
> On Thu, Jul 11, 2019 at 6:04 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> Get* would be my choice, because we look at the set of parallel slots,
>> and get an idle one.
>
> That's what the former GetIdleSlot (that I renamed to get_idle_slot as
> it's not static) is doing. ConsumeIdleSlot() actually get an idle
> slot and mark it as being used. That's probably some leakage of
> internal implementation, but having a GetIdleParallelSlot (or
> ParallelSlotGetIdle) *and* a get_idle_slot sounds like a bad idea, and
> I don't have a better idea on how to rename get_idle_slot. If you
> really prefer Get* and you're fine with a static get_idle_slot, that's
> fine by me.

Do we actually need get_idle_slot? ConsumeIdleSlot is its only
caller.

>> and we'd
>> need to be careful of the same where pg_catalog is listed. Actually,
>> your patch breaks if we do a parallel run with pg_catalog and another
>> schema, no?
>
> It can definitely cause problems if you ask for pg_catalog and other
> schema, same as if you ask a parallel reindex of some catalog tables
> (possibly with other tables). There's a --system switch for that
> need, so I don't know if documenting the limitation to avoid some
> extra code to deal with it is a good enough solution?

vacuumdb --full still has limitations in this area and we had some
reports on the matter about this behavior being annoying. Its
documentation also mentions that mixing catalog relations with --full
can cause deadlocks.

Documenting it may be fine at the end, but my take is that it would be
nice to make sure that we don't have deadlocks if we can avoid them
easily. It is also a matter of balance. If for example the patch
gets 3 times bigger in size because of that we may have an argument
for not doing it and keep the code simple. What do people think about
that? I would be nice to get more opinions here.

And while scanning the code...

+ * getQuerySucess
Typo here.

- * Pump the conn till it's dry of results; return false if any are errors.
This comment could be improved on the way, like "Go through all the
connections and make sure to consume any remaining results. If any
error is found, false is returned after processing all the parallel
slots."
--
Michael

In response to

Re: Add parallelism and glibc dependent only options to reindexdb at 2019-07-11 09:48:20 from Julien Rouhaud

Responses

Re: Add parallelism and glibc dependent only options to reindexdb at 2019-07-11 16:22:25 from Julien Rouhaud

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2019-07-11 13:40:14	Re: complier warnings from ecpg tests
Previous Message	Thomas Munro	2019-07-11 12:49:22	Re: [RFC] [PATCH] Flexible "partition pruning" hook