BUG #17335: Duplicate result rows in Gather node when "Suppress Append ... that have a single child" applied

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: y(dot)sokolov(at)postgrespro(dot)ru
Subject: BUG #17335: Duplicate result rows in Gather node when "Suppress Append ... that have a single child" applied
Date: 2021-12-13 14:45:36
Message-ID: 17335-4dc92e1aea3a78af@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17335
Logged by: Yura Sokolov
Email address: y(dot)sokolov(at)postgrespro(dot)ru
PostgreSQL version: 14.1
Operating system: any
Description:

Our customer faced a query that returns duplicate rows if parallel execution
is enabled
(max_parallel_workers_per_gather > 0) after 11->12 upgrade.

Investigation shows bug were introduced in
8edd0e79460b414b1d971895312e549e95e12e4f
"Suppress Append and MergeAppend plan nodes that have a single child." .
Bug is present in all versions since 12 beta, and it is reproduced in master
branch as well.

Reproduction.
https://gist.github.com/funny-falcon/45fa9460507c015390db99a7efd18a75

Dump contains definition of two tables (with 3 and 2 fields), index and data
dump
large enough to trigger parallel plan execution.
Also `analyze public.testi3, public.testi4` line is added.

https://gist.github.com/funny-falcon/45fa9460507c015390db99a7efd18a75/raw/06a55e57bc6c460ea8800600b1b597c1ed216259/repro_dump.sql.zst

Query:

SELECT count(1)
FROM (
SELECT fk, id
FROM testi3
WHERE flagged
UNION ALL
SELECT fk, id
from testi3
WHERE false
) as t
LEFT OUTER JOIN testi4
USING (fk);

When `set max_parallel_workers_per_gather = 0;` then it returns 44 which is
correct.
When `set max_parallel_workers_per_gather = 2;` then it returns 88 which is
wrong.

Sometimes (for example, when you try to debug it) it returns 44 even
with max_parallel_workers_per_gather = 2.
It seems that it happens when worker returns first row fast enough so
"Gather" node doesn't start to execute plan by itself.

It is confirmed by `set parallel_leader_participation = false`, which forces
correct result.
`set enable_parallel_append = false` also forces correct result.

Original plan included Gather Merge node and max_parallel_workers_per_gather
= 1 .
Reproduction code uses Gather instead and max_parallel_workers_per_gather =
2 just because it were easier to get bad plan.

Plan before change 8edd0e79460b414b1d971895312e549e95e12e4f (returns correct
result)

Finalize Aggregate (cost=25003.16..25003.17 rows=1 width=8) (actual
time=435.473..435.473 rows=1 loops=1)

-> Gather (cost=25003.05..25003.16 rows=1 width=8) (actual
time=434.646..456.889 rows=2 loops=1)

Workers Planned: 1

Workers Launched: 1

-> Partial Aggregate (cost=24003.05..24003.06 rows=1 width=8)
(actual time=433.000..433.000 rows=1 loops=2)
-> Parallel Hash Left Join (cost=21701.47..24002.86 rows=74
width=0) (actual time=312.576..432.981 rows=22 loops=2)
Hash Cond: ("*SELECT* 1".fk = testi4.fk)

-> Parallel Append (cost=0.42..9.18 rows=24 width=4)
(actual time=0.021..0.112 rows=16 loops=2)
-> Subquery Scan on "*SELECT* 1"
(cost=0.42..9.11 rows=24 width=4) (actual time=0.041..0.216 rows=33 loops=1)

-> Index Scan using testi3_ix on testi3
(cost=0.42..8.87 rows=24 width=8) (actual time=0.039..0.207 rows=33
loops=1)
Index Cond: (flagged = true)

-> Parallel Hash (cost=12084.91..12084.91 rows=586091
width=4) (actual time=305.473..305.473 rows=703309 loops=2)
Buckets: 131072 Batches: 32 Memory Usage:
2784kB

-> Parallel Seq Scan on testi4
(cost=0.00..12084.91 rows=586091 width=4) (actual time=0.032..138.958
rows=703309 loops=2)

Plan after that change (returns wrong result)

Finalize Aggregate (cost=25003.87..25003.88 rows=1 width=8) (actual
time=652.062..652.062 rows=1 loops=1)


-> Gather (cost=25003.75..25003.86 rows=1 width=8) (actual
time=651.482..668.718 rows=2 loops=1)


Workers Planned: 1
Workers Launched: 1

-> Partial Aggregate (cost=24003.75..24003.76 rows=1 width=8)
(actual time=649.150..649.150 rows=1 loops=2)


-> Parallel Hash Left Join (cost=21701.47..24003.49
rows=106 width=0) (actual time=461.751..649.111 rows=44 loops=2)


Hash Cond: ("*SELECT* 1".fk = testi4.fk)

-> Subquery Scan on "*SELECT* 1" (cost=0.42..9.51
rows=36 width=4) (actual time=0.050..0.229 rows=33 loops=2)


-> Index Scan using testi3_ix on testi3
(cost=0.42..9.15 rows=36 width=8) (actual time=0.048..0.214 rows=33 loops=2)


Index Cond: (flagged = true)

-> Parallel Hash (cost=12084.91..12084.91 rows=586091
width=4) (actual time=444.931..444.931 rows=703309 loops=2)

Buckets: 131072 Batches: 32 Memory Usage:
2784kB


-> Parallel Seq Scan on testi4
(cost=0.00..12084.91 rows=586091 width=4) (actual time=0.045..202.913
rows=703309 loops=2)

------

regards,
Yura

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2021-12-13 18:48:56 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Previous Message Alexander Korotkov 2021-12-13 14:28:39 Re: BUG #17300: Server crashes on deserializing text multirange