Quick Links

Re: parallel vacuum comments

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject:	Re: parallel vacuum comments
Date:	2021-11-01 01:44:34
Message-ID:	CAD21AoBxGEMMPDHXbFB2oit2eo_VRhUXXtrZYhUzqozr2aWv8A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum
> related code. And I found a few things that I think could stand improvement:
>
> - There's pretty much no tests. This is way way too complicated feature for
> that. If there had been tests for the obvious edge case of some indexes
> being too small to be handled in parallel, but others needing parallelism,
> the mistake leading to #17245 would have been caught during development.

Yes. We should have tests at least for such cases.

>
>
> - There should be error check verifying that all indexes have actually been
> vacuumed. It's way too easy to have bugs leading to index vacuuming being
> skipped.

Agreed.

>
>
> - The state machine is complicated. It's very unobvious that an index needs to
> be processed serially by the leader if shared_indstats == NULL.

I think we can consolidate the logic that decides who (a worker or the
leader) processes the index in one function.

>
>
> - I'm very confused by the existance of LVShared->bitmap. Why is it worth
> saving 7 bits per index for something like this (compared to a simple
> array of bools)? Nor does the naming explain what it's for.
>
> The presence of the bitmap requires stuff like SizeOfLVShared(), which
> accounts for some of the bitmap size, but not all?

Yes, it's better to account for the size of all bitmaps.

>
> But even though we have this space optimized bitmap thing, we actually need
> more memory allocated for each index, making this whole thing pointless.

Right. But is better to change to use booleans?

> - Imo it's pretty confusing to have functions like
> lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
> vacuum or index cleanup with parallel workers.", based on
> lps->lvshared->for_cleanup.

Okay. We need to set lps->lvshared->for_cleanup to tell worker do
either index vacuum or index cleanup. So it might be better to pass
for_cleanup flag down to the functions in addition to setting
lps->lvshared->for_cleanup.

>
>
> - I don't like some of the new names introduced in 14. E.g.
> "do_parallel_processing" is way too generic.

I listed the function names that probably needs to be renamed from
that perspecti:

* do_parallel_processing
* do_serial_processing_for_unsafe_indexes
* parallel_process_one_index

Is there any other function that should be renamed?

> - On a higher level, a lot of this actually doesn't seem to belong into
> vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
> code is heap specific. And vacuumlazy.c is large enough without the parallel
> code.

I don't come with an idea to make them more generic. Could you
elaborate on that?

I've started to write a patch for these comments.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/e

In response to

parallel vacuum comments at 2021-10-30 21:21:01 from Andres Freund

Responses

Re: parallel vacuum comments at 2021-11-01 12:47:17 from Masahiko Sawada
Re: parallel vacuum comments at 2021-11-02 09:27:28 from Amit Kapila
Re: parallel vacuum comments at 2021-11-04 19:00:03 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Masahiko Sawada	2021-11-01 01:48:17	Re: Skipping logical replication transactions on subscriber side
Previous Message	Kyotaro Horiguchi	2021-11-01 01:12:07	Re: enhance pg_log_backend_memory_contexts() to log memory contexts of auxiliary processes