From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
Cc: | Kirill Reshke <reshkekirill(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Parallel CREATE INDEX for GIN indexes |
Date: | 2025-02-16 04:02:16 |
Message-ID: | 06bd6a23-e317-4707-83d0-23c15809547e@vondra.me |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2/12/25 15:59, Matthias van de Meent wrote:
> On Tue, 7 Jan 2025 at 12:59, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>
>> ...
>>
>> I haven't done anything about this, but I'm not sure adding the number
>> of GIN tuples to pg_stat_progress_create_index would be very useful. We
>> don't know the total number of entries, so it can't show the progress.
>
> For btree scans, we update the number of to-be-inserted tuples
> together with the number of blocks scanned. Can we do something
> similar with GIN?
>
I've been thinking about this, but I'm not quite sure how should that
work. The problem is in btree we have a 1:1 mapping to heap tuples, but
in GIN that's not quite that simple. Not only do we generate multiple
GIN entries for each heap row, but we also combine / merge those tuples
in various levels.
But I think it might look like this:
1) Each worker counts the number of GinTuples written to the shared
tuplesort, after the in-worker merge phase (i.e. it'd not be the number
of GIN entries generated in ginBuildCallbackParallel).
2) The leader then counts the number of entries it loaded from the
tuplesort, before merging/writing them into the index.
I think this would work as a measure of progress, even though it does
not really match the number of index tuples.
One thing I'm not not sure about is how would this work with the "single
tuplesort" patch? That patch moves the merging to the tuplesort code,
and there doesn't seem to be a nice way to pass the number of merged
outside.
> Can we track data for pg_stat_progress_create_index?
>
Which data? I think progress for the CREATE INDEX would be nice, ofc.
regards
--
Tomas Vondra
From | Date | Subject | |
---|---|---|---|
Next Message | Junwang Zhao | 2025-02-16 04:37:07 | Re: generic plans and "initial" pruning |
Previous Message | Tomas Vondra | 2025-02-16 03:47:10 | Re: Parallel CREATE INDEX for GIN indexes |