Quick Links

improve EXPLAIN for wide tables

From:	Sami Imseih <samimseih(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	improve EXPLAIN for wide tables
Date:	2024-12-16 16:07:42
Message-ID:	CAA5RZ0th3q-0p1pri58z9grG8r8azmEBa8o1rtkwhLmJg_cH+g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Recently, a user reported that running an explain for a query joining
many wide tables taking more than 1 minute to complete. Running the
query without explain takes only a few seconds.

Further research showed that this is similar to a report from
2018 [1]. colname_is_unique is used to assign unique
column names during deparse and this has O(N^2) behavior.

The good news is this behavior was optimized with commit [2] with the
help of a hash table. While the main stated purpose of this commit was to
improved eparse for views and rules, it also significantly improved
EXPLAIN which goes through the same routine via ExplainPrintPlan and
deparse_context_for_plan_tree.

Looking further into this improvement, I started to question if it is
necessary to make columns unique for EXPLAIN purposes?

An experimental patch ( attached ) to skip making columns unique
for EXPLAIN breaks only one test case for columns with default
column names from a function [3]. I have not had success
finding other cases that break.

src/test/regress/results/rangefuncs.out
@@ -2130,10 +2130,10 @@

explain (verbose, costs off)
select * from testrngfunc();
- QUERY PLAN
-----------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------
Subquery Scan on "*SELECT*"
- Output: "*SELECT*"."?column?", "*SELECT*"."?column?_1"
+ Output: "*SELECT*"."?column?", "*SELECT*"."?column?"
-> Unique
Output: (1), (2)
-> Sort

While the performance impact will likely only be noticeable
on rare use cases, removing the make columns unique
work appears to be a good idea overall. There are probably
a few good ways to fix the broken case above.

running the attached measure.sql
query involving 60 tables with 150 columns

# 17.2 without commit [2]
Time: 58950.505 ms
Time: 472.912 ms

## HEAD with commit [2]
Time: 1229.704 ms
Time: 473.494 ms

## without uniquifying ( attached experimental patch )
Time: 499.185 ms
Time: 473.118 ms

[1] https://www.postgresql.org/message-id/flat/1537818224423-0.post%40n3.nabble.com#5d23ed9ab9cb5ed45c79352141fa3e79
[2] https://github.com/postgres/postgres/commit/52c707483ce4d0161127e4958d981d1b5655865e
[3] https://github.com/postgres/postgres/blob/master/src/test/regress/sql/rangefuncs.sql#L586-L588

Regards,

Sami Imseih
Amazon Web Services (AWS)

Attachment	Content-Type	Size
measure.sql	application/octet-stream	695 bytes
experiment-no-unique-column-deparse.patch	application/octet-stream	2.7 KB

Responses

Re: improve EXPLAIN for wide tables at 2024-12-16 16:21:23 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2024-12-16 16:21:23	Re: improve EXPLAIN for wide tables
Previous Message	Nazir Bilal Yavuz	2024-12-16 16:07:18	Re: Count and log pages set all-frozen by vacuum