BUG #17084: Wrong results of distinct node.

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: maxim(dot)boguk(at)gmail(dot)com
Subject: BUG #17084: Wrong results of distinct node.
Date: 2021-07-06 14:34:35
Message-ID: 17084-9f83b67ab464cc69@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17084
Logged by: Maxim Boguk
Email address: maxim(dot)boguk(at)gmail(dot)com
PostgreSQL version: 13.3
Operating system: Linux
Description:

During investigation of weird app behaviour I found very strange results
with sorting/dising of some dataset.
Issue was reproduced on 12.7 server and 13.3 my local laptop.
Dataset table dump (25MB file) available per request.

QUERY:
WITH t1 AS (
SELECT
ad_id,
advertiser_id,
campaign_id,
campaign_type_id,
ad_group_id,
target_id,
device_type_id,
country_code,
block_format_type_id,
category_group_id,
array_agg(geo_id) AS geo_ids
FROM test
GROUP BY
ad_id,
advertiser_id,
campaign_id,
campaign_type_id,
ad_group_id,
target_id,
device_type_id,
country_code,
block_format_type_id,
category_group_id
),
t2 AS (
SELECT
DISTINCT
ad_group_id,
country_code,
device_type_id,
target_id,
block_format_type_id,
category_group_id,
geo_ids
FROM t1
)

SELECT COUNT(*) FROM t2;

Test:
set jit to off;
set max_parallel_workers_per_gather to 0;
set work_mem to 4MB;
count
-------
83921

set work_mem to '64MB';
count
-------
55634

Plan with 4MB work_mem:
Aggregate (cost=3258730.52..3258730.53 rows=1 width=8) (actual
time=16607.679..16607.680 rows=1 loops=1)
-> HashAggregate (cost=3232649.49..3257461.37 rows=101532 width=67)
(actual time=16126.315..16604.394 rows=83700 loops=1)
Group Key: test.ad_group_id, test.country_code,
test.device_type_id, test.target_id, test.block_format_type_id,
test.category_group_id, array_agg(test.geo_id)
Planned Partitions: 8 Batches: 77 Memory Usage: 4313kB Disk
Usage: 139784kB
-> GroupAggregate (cost=2771948.04..3089235.54 rows=1015320
width=99) (actual time=12097.536..15675.807 rows=705836 loops=1)
Group Key: test.ad_id, test.advertiser_id, test.campaign_id,
test.campaign_type_id, test.ad_group_id, test.target_id,
test.device_type_id, test.country_code, test.block_format_type_id,
test.category_group_id
-> Sort (cost=2771948.04..2797331.04 rows=10153200
width=75) (actual time=12097.524..13354.731 rows=10153197 loops=1)
Sort Key: test.ad_id, test.advertiser_id,
test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id,
test.device_type_id, test.country_code, test.block_format_type_id,
test.category_group_id
Sort Method: external merge Disk: 894184kB
-> Seq Scan on test (cost=0.00..236908.00
rows=10153200 width=75) (actual time=0.015..779.722 rows=10153197 loops=1)
Planning Time: 0.344 ms
Execution Time: 16730.304 ms


QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2051062.74..2051062.75 rows=1 width=8) (actual
time=8404.880..8404.882 rows=1 loops=1)
-> HashAggregate (cost=2048778.27..2049793.59 rows=101532 width=67)
(actual time=8390.952..8402.777 rows=55634 loops=1)
Group Key: test.ad_group_id, test.country_code,
test.device_type_id, test.target_id, test.block_format_type_id,
test.category_group_id, array_agg(test.geo_id)
Batches: 1 Memory Usage: 19473kB
-> HashAggregate (cost=1750369.38..2020856.97 rows=1015320
width=99) (actual time=4478.394..7879.103 rows=705836 loops=1)
Group Key: test.ad_id, test.advertiser_id, test.campaign_id,
test.campaign_type_id, test.ad_group_id, test.target_id,
test.device_type_id, test.country_code, test.block_format_type_id,
test.category_group_id
Planned Partitions: 256 Batches: 257 Memory Usage: 6553kB
Disk Usage: 1102856kB
-> Seq Scan on test (cost=0.00..236908.00 rows=10153200
width=75) (actual time=0.020..788.698 rows=10153197 loops=1)
Planning Time: 0.301 ms
Execution Time: 8538.236 ms

Correnct answers provided with larger work_mem (55634)

What make situatio even more curious that disable hash_agg doesn't make
issue gone:
set enable_hashagg to 0;
set work_mem to '4MB';
...
count
-------
83700

Aggregate (cost=3305553.24..3305553.25 rows=1 width=8) (actual
time=18227.164..18227.167 rows=1 loops=1)
-> Unique (cost=3283977.69..3304284.09 rows=101532 width=67) (actual
time=17595.189..18223.935 rows=83700 loops=1)
-> Sort (cost=3283977.69..3286515.99 rows=1015320 width=67)
(actual time=17595.188..18042.457 rows=705836 loops=1)
Sort Key: t1.ad_group_id, t1.country_code, t1.device_type_id,
t1.target_id, t1.block_format_type_id, t1.category_group_id, t1.geo_ids
Sort Method: external merge Disk: 128680kB
-> Subquery Scan on t1 (cost=2771948.04..3099388.74
rows=1015320 width=67) (actual time=11965.394..15750.759 rows=705836
loops=1)
-> GroupAggregate (cost=2771948.04..3089235.54
rows=1015320 width=99) (actual time=11965.392..15695.102 rows=705836
loops=1)
Group Key: test.ad_id, test.advertiser_id,
test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id,
test.device_type_id, test.country_code, test.block_format_type_id,
test.category_group_id
-> Sort (cost=2771948.04..2797331.04
rows=10153200 width=75) (actual time=11965.378..13283.143 rows=10153197
loops=1)
Sort Key: test.ad_id, test.advertiser_id,
test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id,
test.device_type_id, test.country_code, test.block_format_type_id,
test.category_group_id
Sort Method: external merge Disk:
894184kB
-> Seq Scan on test (cost=0.00..236908.00
rows=10153200 width=75) (actual time=0.014..770.040 rows=10153197 loops=1)
Planning Time: 0.315 ms
Execution Time: 18503.307 ms

--Maxim Boguk

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2021-07-06 14:47:45 Re: BUG #17083: [PATCH] PostgreSQL fails to build with OpenLDAP 2.5.x
Previous Message Adrian Ho 2021-07-06 14:22:00 Re: BUG #17083: [PATCH] PostgreSQL fails to build with OpenLDAP 2.5.x