Re: parallel vacuum comments

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: parallel vacuum comments
Date: 2021-12-11 14:59:31
Message-ID: CAD21AoATpmazwdYge7rN_GGM_mk3v_u0T=-DdQ-kAp1eJkjf3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 11, 2021 at 2:32 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2021-10-30 14:21:01 -0700, Andres Freund wrote:
> > Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum
> > related code. And I found a few things that I think could stand improvement:

Thank you for the comments.

>
> While working on the fix for #17255 (more specifically some cleanup that Peter
> suggested in the context), I noticed another thing: Initializing parallelism
> as part of dead_items_alloc() is a bad idea. Even if there are comments noting
> that oddity.
>
> I don't really see why we should do it this way? There's no "no-parallelism"
> path in begin_parallel_vacuum() besides compute_parallel_vacuum_workers(). So
> it's not like we might just discover the inability to do parallelism during
> parallel initialization?

Right. Also, in parallel vacuum case, it allocates the space not only
for dead items but also other data required to do parallelism like
shared bulkdeletion results etc. Originally, in PG13,
begin_parallel_vacuum() was called by lazy_scan_heap() but in PG14 it
became part of dead_items_alloc() (see b4af70cb2). I agree to change
this part so that lazy_scan_heap() calls begin_parallel_vacuum()
(whatever we rename it). I'll incorporate this change in the
refactoring patch barring any objections.

>
> It's also not particularly helpful to have a begin_parallel_vacuum() that
> might not actually begin a parallel vacuum...

During the development, I found that we have some begin_* functions
that don't start the actual parallel job but prepare state data for
starting parallel job and referred to _bt_begin_parallel() so I named
begin_parallel_vacuum(). But I admit that considering what the
function actually does, something like
create_parallel_vacuum_context() would be clearer.

>
> Minor nit:
>
> begin_parallel_vacuum()'s comment says:
> * On success (when we can launch one or more workers), will set dead_items and
> * lps in vacrel for caller.
>
> But it actually doesn't know whether we can start workers. It just checks
> max_parallel_maintenance_workers, no?

Yes, we cannot know whether we can actually start workers when
starting parallel index vacuuming. It returns non-NULL if we request
one or more workers.

Regards

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2021-12-11 15:17:17 Re: pg_dump versus ancient server versions
Previous Message Andrew Dunstan 2021-12-11 13:12:55 Re: isolationtester: add session name to application name