Distinct performance dropped by multiple times in v16

From: Vitaliy Litovskiy <vitaliy(dot)litovskiy(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Distinct performance dropped by multiple times in v16
Date: 2024-06-10 06:59:04
Message-ID: CAD3pRn_BbRs4OkWXP7T=beJ1Pv1euWJ6jBQ3Sd62ZF34E2hrrw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

In our application after upgrading from postgresql from v15 to v16
performance for some queries dropped from less than 1-2 seconds to 2+
minutes.
After checking out some blogposts regarding DISTINCT improvements in v16 i
was able to "fix" it by adding order by clauses in subqueries, still it
looks more like a workaround for me and i would like to ask if it is
working as expected now.

I was able to reproduce the issue by the following steps
1. start postgresql v16 in docker
docker run --name some-postgres16 -e
POSTGRES_PASSWORD=mysecretpassword -d postgres:16
2. create tables with data.
- data is originated by northwind db, but tables were created with
different schema by custom software
- each table has 8k rows
- _BDN_Terretories__202406100822.sql
and _BDN_EmployeeTerritories__202406100822.sql files are attached
3. Execute the following query

select distinct tbl1."BDN_EmployeeTerritories_ID", tbl2."BDN_Terretories_ID",
tbl1."Reference_Date" from

(

select "BDN_EmployeeTerritories_ID", "Reference_Date", token

from public."BDN_EmployeeTerritories",

unnest(string_to_array("EMP_TerretoryID", ';')) s(token)

--order by token

) tbl1

join (

select "BDN_Terretories_ID", token from public."BDN_Terretories", unnest(
string_to_array("EMP_TerretoryID", ';')) s(token)

) tbl2 on tbl1.token = tbl2.token Observations:
1. query runs for 4-5 seconds on v16 and less than a second on v15 2. in
v16 it also goes downs to less than a second if 2.1 distinct is removed

2.2 unnest is removed. it is not really needed for this particular data but
this query is autogenerated and unnest makes sense for other data

2.3 "order by token" is uncommented, this is my current way of fixing the
problem I would really appreciate some feedback if that is expected
behaviour and if there are better solutions

--
Best Regards,
Vitalii Lytovskyi

Attachment Content-Type Size
_BDN_EmployeeTerritories__202406100822.sql application/octet-stream 596.6 KB
_BDN_Terretories__202406100822.sql application/octet-stream 720.5 KB

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Sabino Mullane 2024-06-10 16:24:29 Re: Distinct performance dropped by multiple times in v16
Previous Message Satalabaha Postgres 2024-06-04 06:46:09 query column in pg_stat_statements and pg_stat_activity