improve EXPLAIN for wide tables

From: Sami Imseih <samimseih(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: improve EXPLAIN for wide tables
Date: 2024-12-16 16:07:42
Message-ID: CAA5RZ0th3q-0p1pri58z9grG8r8azmEBa8o1rtkwhLmJg_cH+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Recently, a user reported that running an explain for a query joining
many wide tables taking more than 1 minute to complete. Running the
query without explain takes only a few seconds.

Further research showed that this is similar to a report from
2018 [1]. colname_is_unique is used to assign unique
column names during deparse and this has O(N^2) behavior.

The good news is this behavior was optimized with commit [2] with the
help of a hash table. While the main stated purpose of this commit was to
improved eparse for views and rules, it also significantly improved
EXPLAIN which goes through the same routine via ExplainPrintPlan and
deparse_context_for_plan_tree.

Looking further into this improvement, I started to question if it is
necessary to make columns unique for EXPLAIN purposes?

An experimental patch ( attached ) to skip making columns unique
for EXPLAIN breaks only one test case for columns with default
column names from a function [3]. I have not had success
finding other cases that break.

src/test/regress/results/rangefuncs.out
@@ -2130,10 +2130,10 @@

explain (verbose, costs off)
select * from testrngfunc();
- QUERY PLAN
-----------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------
Subquery Scan on "*SELECT*"
- Output: "*SELECT*"."?column?", "*SELECT*"."?column?_1"
+ Output: "*SELECT*"."?column?", "*SELECT*"."?column?"
-> Unique
Output: (1), (2)
-> Sort

While the performance impact will likely only be noticeable
on rare use cases, removing the make columns unique
work appears to be a good idea overall. There are probably
a few good ways to fix the broken case above.

running the attached measure.sql
query involving 60 tables with 150 columns

# 17.2 without commit [2]
Time: 58950.505 ms
Time: 472.912 ms

## HEAD with commit [2]
Time: 1229.704 ms
Time: 473.494 ms

## without uniquifying ( attached experimental patch )
Time: 499.185 ms
Time: 473.118 ms

[1] https://www.postgresql.org/message-id/flat/1537818224423-0.post%40n3.nabble.com#5d23ed9ab9cb5ed45c79352141fa3e79
[2] https://github.com/postgres/postgres/commit/52c707483ce4d0161127e4958d981d1b5655865e
[3] https://github.com/postgres/postgres/blob/master/src/test/regress/sql/rangefuncs.sql#L586-L588

Regards,

Sami Imseih
Amazon Web Services (AWS)

Attachment Content-Type Size
measure.sql application/octet-stream 695 bytes
experiment-no-unique-column-deparse.patch application/octet-stream 2.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-12-16 16:21:23 Re: improve EXPLAIN for wide tables
Previous Message Nazir Bilal Yavuz 2024-12-16 16:07:18 Re: Count and log pages set all-frozen by vacuum