Re: improve EXPLAIN for wide tables

From: Sami Imseih <samimseih(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: improve EXPLAIN for wide tables
Date: 2024-12-16 17:06:22
Message-ID: CAA5RZ0tzoF=cy6-LH4Hz=9khsuZ49qHkWLLp+yy5MCLKnq0fhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > Looking further into this improvement, I started to question if it is
> > necessary to make columns unique for EXPLAIN purposes?

> Yes, otherwise references to them elsewhere in the plan will be
> ambiguous.

Explain will qualify the column name with the table name such as the simple
example below with the experimental patch. I have not been able to find cases,
except for the failed test case that involves "?column?", in which the
plan will result
in ambiguous column names.

postgres=# create table test (id int, id2 int);
CREATE TABLE
postgres=# create table test2 (id int, id2 int);
CREATE TABLE

postgres=# explain select * from test, test2 where test.id = test2.id
order by 1;
QUERY PLAN
---------------------------------------------------------------------
Merge Join (cost=317.01..711.38 rows=25538 width=16)
Merge Cond: (test.id = test2.id)
-> Sort (cost=158.51..164.16 rows=2260 width=8)
Sort Key: test.id
-> Seq Scan on test (cost=0.00..32.60 rows=2260 width=8)
-> Sort (cost=158.51..164.16 rows=2260 width=8)
Sort Key: test2.id
-> Seq Scan on test2 (cost=0.00..32.60 rows=2260 width=8)
(8 rows)

postgres=# explain verbose select * from test, test2 where test.id =
test2.id order by 1;
QUERY PLAN
----------------------------------------------------------------------------
Merge Join (cost=317.01..711.38 rows=25538 width=16)
Output: test.id, test.id2, test2.id, test2.id2
Merge Cond: (test.id = test2.id)
-> Sort (cost=158.51..164.16 rows=2260 width=8)
Output: test.id, test.id2
Sort Key: test.id
-> Seq Scan on public.test (cost=0.00..32.60 rows=2260 width=8)
Output: test.id, test.id2
-> Sort (cost=158.51..164.16 rows=2260 width=8)
Output: test2.id, test2.id2
Sort Key: test2.id
-> Seq Scan on public.test2 (cost=0.00..32.60 rows=2260 width=8)
Output: test2.id, test2.id2
Query Identifier: 2523359249885908438
(14 rows)

> It looks like your proposal tries to dodge that by unique-ifying
> in some cases but not others, which strikes me as a totally
> random and confusing thing to do.

The work is required for things like deparsing views because Postgres
has to construct the underlying sql definition and part of that is it has to
make columns unique to generate a valid sql, such as the case here [1].
For planning purposes, the column name must already be unambiguous.
right?

> Is there any reason to think that 52c707483 wasn't a sufficient
> response to this issue?
>
> regards, tom lane

In the case reported, similar to the one in the earlier attached
experiment.sql, even with the hash table optimization, there is still
significant
overhead from the unique-ifying work. for such users, this can impact real-world
performance particularly if they have enabled extensions such as auto_explain.

[1] https://github.com/postgres/postgres/blob/master/src/test/regress/sql/create_view.sql#L550-L565

Regards,

Sami

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2024-12-16 17:31:47 Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?
Previous Message Alvaro Herrera 2024-12-16 17:05:59 Re: FileFallocate misbehaving on XFS