Re: Support tid range scan in parallel?

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Cary Huang <cary(dot)huang(at)highgo(dot)ca>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support tid range scan in parallel?
Date: 2024-05-04 01:14:16
Message-ID: CAApHDvqeySxbaz35yCe_kWNDWT03QoHxA9MtmqhUHf-P1AMZ7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 4 May 2024 at 06:55, Cary Huang <cary(dot)huang(at)highgo(dot)ca> wrote:
> With syncscan enabled, the "table_block_parallelscan_nextpage()" would
> return the next block since the end of the first tid rangescan instead of the
> correct start block that should be scanned. I see that single tid rangescan
> does not have SO_ALLOW_SYNC set, so I figure syncscan should also be
> disabled in parallel case. With this change, then it would be okay to call
> heap_setscanlimits() in parallel case, so I added this call back to
> heap_set_tidrange() in both serial and parallel cases.

This now calls heap_setscanlimits() for the parallel version, it's
just that table_block_parallelscan_nextpage() does nothing to obey
those limits.

The only reason the code isn't reading the entire table is due to the
optimisation in heap_getnextslot_tidrange() which returns false when
the ctid goes out of range. i.e, this code:

/*
* When scanning forward, the TIDs will be in ascending order.
* Future tuples in this direction will be higher still, so we can
* just return false to indicate there will be no more tuples.
*/
if (ScanDirectionIsForward(direction))
return false;

If I comment out that line, I see all pages are accessed:

postgres=# explain (analyze, buffers) select count(*) from a where
ctid >= '(0,1)' and ctid < '(11,0)';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=18.80..18.81 rows=1 width=8) (actual
time=33.530..36.118 rows=1 loops=1)
Buffers: shared read=4425
-> Gather (cost=18.78..18.79 rows=2 width=8) (actual
time=33.456..36.102 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared read=4425
-> Partial Aggregate (cost=18.78..18.79 rows=1 width=8)
(actual time=20.389..20.390 rows=1 loops=3)
Buffers: shared read=4425
-> Parallel Tid Range Scan on a (cost=0.01..16.19
rows=1035 width=0) (actual time=9.375..20.349 rows=829 loops=3)
TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid <
'(11,0)'::tid))
Buffers: shared read=4425 <---- this is all
pages in the table instead of 11 pages.

With that code still commented out, the non-parallel version still
won't read all pages due to the setscanlimits being obeyed.

postgres=# set max_parallel_workers_per_gather=0;
SET
postgres=# explain (analyze, buffers) select count(*) from a where
ctid >= '(0,1)' and ctid < '(11,0)';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Aggregate (cost=45.07..45.08 rows=1 width=8) (actual
time=0.302..0.302 rows=1 loops=1)
Buffers: shared hit=11
-> Tid Range Scan on a (cost=0.01..38.86 rows=2485 width=0)
(actual time=0.019..0.188 rows=2486 loops=1)
TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid < '(11,0)'::tid))
Buffers: shared hit=11

If I put that code back in, how many pages are read depends on the
number of parallel workers as workers will keep running with higher
page numbers and heap_getnextslot_tidrange() will just (inefficiently)
filter those out.

max_parallel_workers_per_gather=2;
-> Parallel Tid Range Scan on a (cost=0.01..16.19
rows=1035 width=0) (actual time=0.191..0.310 rows=829 loops=3)
TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid <
'(11,0)'::tid))
Buffers: shared read=17

max_parallel_workers_per_gather=3;
-> Parallel Tid Range Scan on a (cost=0.01..12.54
rows=802 width=0) (actual time=0.012..0.114 rows=622 loops=4)
TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid <
'(11,0)'::tid))
Buffers: shared hit=19

max_parallel_workers_per_gather=4;
-> Parallel Tid Range Scan on a (cost=0.01..9.72
rows=621 width=0) (actual time=0.014..0.135 rows=497 loops=5)
TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid <
'(11,0)'::tid))
Buffers: shared hit=21

To fix this you need to make table_block_parallelscan_nextpage obey
the limits imposed by heap_setscanlimits().

The equivalent code in the non-parallel version is in
heapgettup_advance_block().

/* check if the limit imposed by heap_setscanlimits() is met */
if (scan->rs_numblocks != InvalidBlockNumber)
{
if (--scan->rs_numblocks == 0)
return InvalidBlockNumber;
}

I've not studied exactly how you'd get the rs_numblock information
down to the parallel scan descriptor. But when you figure that out,
just remember that you can't do the --scan->rs_numblocks from
table_block_parallelscan_nextpage() as that's not parallel safe. You
might be able to add an or condition to: "if (nallocated >=
pbscan->phs_nblocks)" to make it "if (nallocated >=
pbscan->phs_nblocks || nallocated >= pbscan->phs_numblocks)",
although the field names don't seem very intuitive there. It would be
nicer if the HeapScanDesc field was called rs_blocklimit rather than
rs_numblocks. It's not for this patch to go messing with that,
however.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Burbery 2024-05-04 01:49:38 Proposal for CREATE OR REPLACE EVENT TRIGGER in PostgreSQL
Previous Message Joe Conway 2024-05-03 22:36:05 Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation