Re: BUG #17084: Wrong results of distinct node.

From: Pantelis Theodosiou <ypercube(at)gmail(dot)com>
To: maxim(dot)boguk(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17084: Wrong results of distinct node.
Date: 2021-07-06 15:22:23
Message-ID: CAE3TBxwq8qxLRXZhjaV-WeZ=xmi9JEsEV0h8EUH2YLDzg_UYyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Jul 6, 2021 at 4:09 PM PG Bug reporting form <noreply(at)postgresql(dot)org>
wrote:

> The following bug has been logged on the website:
>
> Bug reference: 17084
> Logged by: Maxim Boguk
> Email address: maxim(dot)boguk(at)gmail(dot)com
> PostgreSQL version: 13.3
> Operating system: Linux
> Description:
>
> During investigation of weird app behaviour I found very strange results
> with sorting/dising of some dataset.
> Issue was reproduced on 12.7 server and 13.3 my local laptop.
> Dataset table dump (25MB file) available per request.
>
> QUERY:
> WITH t1 AS (
> SELECT
> ad_id,
> advertiser_id,
> campaign_id,
> campaign_type_id,
> ad_group_id,
> target_id,
> device_type_id,
> country_code,
> block_format_type_id,
> category_group_id,
> array_agg(geo_id) AS geo_ids
> FROM test
> GROUP BY
> ad_id,
> advertiser_id,
> campaign_id,
> campaign_type_id,
> ad_group_id,
> target_id,
> device_type_id,
> country_code,
> block_format_type_id,
> category_group_id
> ),
> t2 AS (
> SELECT
> DISTINCT
> ad_group_id,
> country_code,
> device_type_id,
> target_id,
> block_format_type_id,
> category_group_id,
> geo_ids
> FROM t1
> )
>
> SELECT COUNT(*) FROM t2;
>
>
> Test:
> set jit to off;
> set max_parallel_workers_per_gather to 0;
> set work_mem to 4MB;
> count
> -------
> 83921
>
> set work_mem to '64MB';
> count
> -------
> 55634
>
> ...
>
>
> Correnct answers provided with larger work_mem (55634)
>
> What make situatio even more curious that disable hash_agg doesn't make
> issue gone:
> set enable_hashagg to 0;
> set work_mem to '4MB';
> ...
> count
> -------
> 83700
>
> ...
>

Do you get different results if you provide an ORDER BY in the aggregate
array_agg() function?
Without an ORDER BY there, I don't think there is any guarantee for
deterministic results.

Best regards,

Pantelis Theodosiou

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2021-07-06 15:25:30 Re: BUG #17084: Wrong results of distinct node.
Previous Message Tom Lane 2021-07-06 14:47:45 Re: BUG #17083: [PATCH] PostgreSQL fails to build with OpenLDAP 2.5.x