From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Parallel CREATE INDEX for BRIN indexes |
Date: | 2023-07-05 08:44:00 |
Message-ID: | CAEze2Wh8TLFC=GZCdJ0uPbuhGS-Mo1HXrcd52+cEXhwSb+mvUw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 5 Jul 2023 at 00:08, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
>
>
> On 7/4/23 23:53, Matthias van de Meent wrote:
> > On Thu, 8 Jun 2023 at 14:55, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >>
> >> Hi,
> >>
> >> Here's a WIP patch allowing parallel CREATE INDEX for BRIN indexes. The
> >> infrastructure (starting workers etc.) is "inspired" by the BTREE code
> >> (i.e. copied from that and massaged a bit to call brin stuff).
> >
> > Nice work.
> >
> >> In both cases _brin_end_parallel then reads the summaries from worker
> >> files, and adds them into the index. In 0001 this is fairly simple,
> >> although we could do one more improvement and sort the ranges by range
> >> start to make the index nicer (and possibly a bit more efficient). This
> >> should be simple, because the per-worker results are already sorted like
> >> that (so a merge sort in _brin_end_parallel would be enough).
> >
> > I see that you manually built the passing and sorting of tuples
> > between workers, but can't we use the parallel tuplesort
> > infrastructure for that? It already has similar features in place and
> > improves code commonality.
> >
>
> Maybe. I wasn't that familiar with what parallel tuplesort can and can't
> do, and the little I knew I managed to forget since I wrote this patch.
> Which similar features do you have in mind?
>
> The workers are producing the results in "start_block" order, so if they
> pass that to the leader, it probably can do the usual merge sort.
>
> >> For 0002 it's a bit more complicated, because with a single parallel
> >> scan brinbuildCallbackParallel can't decide if a range is assigned to a
> >> different worker or empty. And we want to generate summaries for empty
> >> ranges in the index. We could either skip such range during index build,
> >> and then add empty summaries in _brin_end_parallel (if needed), or add
> >> them and then merge them using "union".
> >>
> >>
> >> I just realized there's a third option to do this - we could just do
> >> regular parallel scan (with no particular regard to pagesPerRange), and
> >> then do "union" when merging results from workers. It doesn't require
> >> the sequence of TID scans, and the union would also handle the empty
> >> ranges. The per-worker results might be much larger, though, because
> >> each worker might produce up to the "full" BRIN index.
> >
> > Would it be too much effort to add a 'min_chunk_size' argument to
> > table_beginscan_parallel (or ParallelTableScanDesc) that defines the
> > minimum granularity of block ranges to be assigned to each process? I
> > think that would be the most elegant solution that would require
> > relatively little effort: table_block_parallelscan_nextpage already
> > does parallel management of multiple chunk sizes, and I think this
> > modification would fit quite well in that code.
> >
>
> I'm confused. Isn't that pretty much exactly what 0002 does? I mean,
> that passes pagesPerRange to table_parallelscan_initialize(), so that
> each pagesPerRange is assigned to a single worker.
Huh, I overlooked that one... Sorry for that.
> The trouble I described above is that the scan returns tuples, and the
> consumer has no idea what was the chunk size or how many other workers
> are there. Imagine you get a tuple from block 1, and then a tuple from
> block 1000. Does that mean that the blocks in between are empty or that
> they were processed by some other worker?
If the unit of work for parallel table scans is the index's
pages_per_range, then I think we can just fill in expected-but-missing
ranges as 'empty' in the parallel leader during index loading, like
the first of the two solutions you proposed.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech/)
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2023-07-05 08:49:35 | Re: pg_upgrade and cross-library upgrades |
Previous Message | Daniel Gustafsson | 2023-07-05 07:37:27 | Re: Wrong syntax in feature description |