RE: Parallel heap vacuum

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Parallel heap vacuum
Date: 2024-07-31 03:54:22
Message-ID: TYAPR01MB5692B24EF8A304FB36206772F5B12@TYAPR01MB5692.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Sawada-san,

> Thank you for testing!

I tried to profile the vacuuming with the larger case (40 workers for the 20G table)
and attached FlameGraph showed the result. IIUC, I cannot find bottlenecks.

> > 2.
> > I compared parallel heap scan and found that it does not have compute_worker
> API.
> > Can you clarify the reason why there is an inconsistency?
> > (I feel it is intentional because the calculation logic seems to depend on the
> heap structure,
> > so should we add the API for table scan as well?)
>
> There is room to consider a better API design, but yes, the reason is
> that the calculation logic depends on table AM implementation. For
> example, I thought it might make sense to consider taking the number
> of all-visible pages into account for the calculation of the number of
> parallel workers as we don't want to launch many workers on the table
> where most pages are all-visible. Which might not work for other table
> AMs.
>

Okay, thanks for confirming. I wanted to ask others as well.

> I'm updating the patch to implement parallel heap vacuum and will
> share the updated patch. It might take time as it requires to
> implement shared iteration support in radx tree.

Here are other preliminary comments for v2 patch. This does not contain
cosmetic ones.

1.
Shared data structure PHVShared does not contain the mutex lock. Is it intentional
because they are accessed by leader only after parallel workers exit?

2.
Per my understanding, the vacuuming goes like below steps.

a. paralell workers are launched for scanning pages
b. leader waits until scans are done
c. leader does vacuum alone (you may extend here...)
d. parallel workers are launched again to cleanup indeces

If so, can we reuse parallel workers for the cleanup? Or, this is painful
engineering than the benefit?

3.
According to LaunchParallelWorkers(), the bgw_name and bgw_type are hardcoded as
"parallel worker ..." Can we extend this to improve the trackability in the
pg_stat_activity?

4.
I'm not the expert TidStore, but as you said TidStoreLockExclusive() might be a
bottleneck when tid is added to the shared TidStore. My another primitive idea
is that to prepare per-worker TidStores (in the PHVScanWorkerState or LVRelCounters?)
and gather after the heap scanning. If you extend like parallel workers do vacuuming,
the gathering may not be needed: they can access own TidStore and clean up.
One downside is that the memory consumption may be quite large.

How do you think?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
40_flamegraph.svg application/octet-stream 342.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-07-31 04:06:06 Re: long-standing data loss bug in initial sync of logical replication
Previous Message Steven Niu 2024-07-31 03:15:47 [Patch] remove duplicated smgrclose