Re: Parallel CREATE INDEX for GIN indexes

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Parallel CREATE INDEX for GIN indexes
Date: 2024-07-02 15:11:52
Message-ID: c10393d7-a793-4bf8-afa7-017861ad4f5d@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/2/24 02:07, Andy Fan wrote:
> Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:
>
> Hi Tomas,
>
> I am in a incompleted review process but I probably doesn't have much
> time on this because of my internal tasks. So I just shared what I
> did and the non-good-result patch.
>
> What I tried to do is:
> 1) remove all the "sort" effort for the state->bs_sort_state tuples since
> its input comes from state->bs_worker_state which is sorted already.
>
> 2). remove *partial* "sort" operations between accum.rbtree to
> state->bs_worker_state because the tuple in accum.rbtree is sorted
> already.
>
> Both of them can depend on the same API changes.
>
> 1.
> struct Tuplesortstate
> {
> ..
> + bool input_presorted; /* the tuples are presorted. */
> + new_tapes; // writes the tuples in memory into a new 'run'.
> }
>
> and user can set it during tuplesort_begin_xx(, presorte=true);
>
>
> 2. in tuplesort_puttuple, if memory is full but presorted is
> true, we can (a) avoid the sort. (b) resuse the existing 'runs'
> to reduce the effort of 'mergeruns' unless new_tapes is set to
> true. once it switch to a new tapes, the set state->new_tapes to false
> and wait 3) to change it to true again.
>
> 3. tuplesort_dumptuples(..); // dump the tuples in memory and set
> new_tapes=true to tell the *this batch of input is presorted but they
> are done, the next batch is just presort in its own batch*.
>
> In the gin-parallel-build case, for the case 1), we can just use
>
> for tuple in bs_worker_sort:
> tuplesort_putgintuple(state->bs_sortstate, ..);
> tuplesort_dumptuples(..);
>
> At last we can get a). only 1 run in the worker so that the leader can
> have merge less runs in mergeruns. b). reduce the sort both in
> perform_sort_tuplesort and in sortstate_puttuple_common.
>
> for the case 2). we can have:
>
> for tuple in RBTree.tuples:
> tuplesort_puttuples(tuple) ;
> // this may cause a dumptuples internally when the memory is full,
> // but it is OK.
> tuplesort_dumptuples(..);
>
> we can just remove the "sort" into sortstate_puttuple_common but
> probably increase the 'runs' in sortstate which will increase the effort
> of mergeruns at last.
>
> But the test result is not good, maybe the 'sort' is not a key factor of
> this. I do missed the perf step before doing this. or maybe my test data
> is too small.
>

If I understand the idea correctly, you're saying that we write the data
from BuildAccumulator already sorted, so if we do that only once, it's
already sorted and we don't actually need the in-worker tuplesort.

I think that's a good idea in principle, but maybe the simplest way to
handle this is by remembering if we already flushed any data, and if we
do that for the first time at the very end of the scan, we can write
stuff directly to the shared tuplesort. That seems much simpler than
doing this inside the tuplesort code.

Or did I get the idea wrong?

FWIW I'm not sure how much this will help in practice. We only really
want to do parallel index build for fairly large tables, which makes it
less likely the data will fit into the buffer (and if we flush during
the scan, that disables the optimization).

> Here is the patch I used for the above activity. and I used the
> following sql to test.
>
> CREATE TABLE t(a int[], b numeric[]);
>
> -- generate 1000 * 1000 rows.
> insert into t select i, n
> from normal_rand_array(1000, 90, 1::int4, 10000::int4) i,
> normal_rand_array(1000, 90, '1.00233234'::numeric, '8.239241989134'::numeric) n;
>
> alter table t set (parallel_workers=4);
> set debug_parallel_query to on;

I don't think this forces parallel index builds - this GUC only affects
queries that go through the regular planner, but index build does not do
that, it just scans the table directly.

So maybe your testing did not actually do any parallel index builds?
That might explain why you didn't see any improvements.

Maybe try this to "force" parallel index builds:

set min_parallel_table_scan = '64kB';
set maintenance_work_mem = '256MB';

> set max_parallel_maintenance_workers to 4;
>
> create index on t using gin(a);
> create index on t using gin(b);
>
> I found normal_rand_array is handy to use in this case and I
> register it into https://commitfest.postgresql.org/48/5061/.
>
> Besides the above stuff, I didn't find anything wrong in the currrent
> patch, and the above stuff can be categoried into "furture improvement"
> even it is worthy to.
>

Thanks for the review!

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dagfinn Ilmari Mannsåker 2024-07-02 15:38:01 Re: Cleaning up perl code
Previous Message Andrew Dunstan 2024-07-02 15:11:22 Re: Cleaning up perl code