Re: select_parallel test fails with nonstandard block size

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: select_parallel test fails with nonstandard block size
Date: 2016-09-15 15:46:53
Message-ID: 479.1473954413@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> OK, I'll take care of it (since I now realize that the inconsistency
> is my own fault --- I committed that GUC not you). It's unclear what
> this will do for Peter's complaint though.

On closer inspection, the answer is "nothing", because the select_parallel
test overrides the default value of min_parallel_relation_size anyway.
(Without that, I don't think tenk1 is large enough to trigger
consideration of parallel scan at all.)

I find that at BLCKSZ 8K, the planner thinks the best plan is

HashAggregate (cost=5320.28..7920.28 rows=10000 width=12)
Group Key: parallel_restricted(unique1)
-> Index Only Scan using tenk1_unique1 on tenk1 (cost=0.29..2770.28 rows=10000 width=8)

which is what the regression test script expects. Forcing the parallel
plan to be chosen, we get this using the cost parameters set up by
select_parallel:

HashAggregate (cost=5433.00..8033.00 rows=10000 width=12)
Group Key: parallel_restricted(unique1)
-> Gather (cost=0.00..2883.00 rows=10000 width=8)
Workers Planned: 4
-> Parallel Seq Scan on tenk1 (cost=0.00..383.00 rows=2500 width=4)

However, at BLCKSZ 16K, we get these numbers instead:

HashAggregate (cost=5264.28..7864.28 rows=10000 width=12)
Group Key: parallel_restricted(unique1)
-> Index Only Scan using tenk1_unique1 on tenk1 (cost=0.29..2714.28 rows=10000 width=8)

HashAggregate (cost=5251.00..7851.00 rows=10000 width=12)
Group Key: parallel_restricted(unique1)
-> Gather (cost=0.00..2701.00 rows=10000 width=8)
Workers Planned: 4
-> Parallel Seq Scan on tenk1 (cost=0.00..201.00 rows=2500 width=4)

so the planner goes for the second one.

I don't think there's anything particularly broken here. The seqscan
cost estimate is largely dependent on the number of blocks, and there's
half as many blocks at 16K. The indexscan estimate is also reduced,
but not as much, so it stops looking like the cheaper alternative.

We could maybe twiddle the cost parameters select_parallel uses so that
the same plan is chosen at both block sizes, but it seems like it would
be very fragile, and I'm not sure there's much point.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-09-15 15:50:28 Re: Vacuum: allow usage of more than 1GB of work mem
Previous Message Alex Ignatov 2016-09-15 15:45:12 Parallel sec scan in plpgsql