From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Andreas Karlsson <andreas(at)proxel(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: WIP: parallel GiST index builds |
Date: | 2024-11-08 14:53:36 |
Message-ID: | 3e526629-d915-4c98-b0dc-19f39d96f7e0@vondra.me |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 10/31/24 19:05, Andrey M. Borodin wrote:
>
>
>> On 8 Oct 2024, at 17:05, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>
>> Here's an updated patch adding the queryid.
>
> I've took another round of looking through the patch.
> Everything I see seems correct to me. It just occurred to me that we
> will have: buffered build, parallel build, sorting build. All 3 aiming
> to speed things up. I really hope that we will find a way to parallel
> sorting build, because it will be fastest for sure.
>
The number of different ways to build a GiST index worries me. When we
had just buffered vs. sorted builds, that was pretty easy decision,
because buffered builds are much faster 99% of the time.
But for parallel builds it's not that clear - it can easily happen that
we use much more CPU, without speeding anything up. We just start as
many parallel workers as possible, given the maintenance_work_mem value,
and hope for the best.
Maybe with parallel buffered builds it's be clearer ... but I'm not
sufficiently familiar with that code, and I don't have time to study
that at the moment because of other patches. Someone else would have to
take a stab at that ...
>
> Currently we have some instances of such code...or similar... or
> related code.
>
> if (is_build)
> {
> if (is_parallel)
> recptr = GetFakeLSNForUnloggedRel();
> else
> recptr = GistBuildLSN;
> }
> else
> {
> if (RelationNeedsWAL(rel))
> {
> recptr = actuallyWriteWAL();
> }
> else
> recptr = gistGetFakeLSN(rel);
> }
> // Use recptr
>
> In previous review I was proponent of not adding arguments to
> gistGetFakeLSN(). But now I see that now logic of choosing LSN
> source is sprawling across various places...
>
> Now I do not have a strong point of view on this. Do you think if
> something like following would be clearer?
> if (!is_build && RelationNeedsWAL(rel))
> {
> recptr = actuallyWriteWAL();
> }
> else
> recptr = gistGetFakeLSN(rel, is_build, is_parallel);
>
> Just as an idea.
>
> I'm mostly looking on GiST-specific parts of the patch, while things
> around entering parallel mode seems much more complicated. But as far as
> I can see, large portions of this code are taken from B-tree\BRIN.
>
I agree the repeated code is pretty tedious, and it's also easy to miss
one of the places when changing the logic, so I think wrapping that in
some function makes sense.
regards
--
Tomas Vondra
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2024-11-08 14:54:07 | Re: DROP VIEW name WITHOUT TYPE |
Previous Message | Andres Freund | 2024-11-08 14:53:01 | Re: Commit Timestamp and LSN Inversion issue |