Re: Removing unneeded self joins

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: Greg Stark <stark(at)mit(dot)edu>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2022-05-17 14:14:46
Message-ID: 2537176.Lt9SDvczpP@aivenronan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit :
> New version of the feature.
> Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
> when uniqueness of an inner deduced not from join quals, but from a
> baserestrictinfo clauses 'x=const', where x - unique field.
> Code, dedicated to solve second issue is controversial, so i attached
> delta.txt for quick observation.
> Maybe we should return to previous version of code, when we didn't split
> restriction list into join quals and base quals?

Hello,

I tried to find problematic cases, which would make the planning time grow
unacceptably, and couldn't devise it.

The worst case scenario I could think of was as follows:

- a query with many different self joins
- an abundance of unique indexes on combinations of this table columns to
consider
- additional predicates on the where clause on columns.

The base table I used for this was a table with 40 integers. 39 unique indexes
were defined on every combination of (c1, cX) with cX being columns c2 to c40.

I turned geqo off, set from_collapse_limit and join_collapse_limit to
unreasonably high values (30), and tried to run queries of the form:

SELECT * FROM test_table t1
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX
...
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX.

So no self join can be eliminated in that case.
The performance was very similar with or without the GUC enabled. I tested the
same thing without the patch, since the test for uniqueness has been slightly
altered and incurs a new allocation, but it doesn't seem to change.

One interesting side effect of this patch, is that removing any unneeded self
join cuts down the planification time very significantly, as we lower the number
of combinations to consider.

As for the code:

- Comments on relation_has_unique_index_ext and relation_has_unique_index_for
should be rewritten, as relation_has_unique_index_for is now just a special
case of relation_has_unique_index_ext. By the way, the comment would probably
be better read as: "but if extra_clauses isn't NULL".
- The whole thing about "extra_clauses", ie, baserestrictinfos which were
used to determine uniqueness, is not very clear. Most functions where the new
argument has been added have not seen an update in their comments, and the
name itself doesn't really convey the intented meaning: perhaps
required_non_join_clauses ?

The way this works should be explained a bit more thoroughly, for example in
remove_self_joins_one_group the purpose of uclauses should be explained. The
fact that degenerate_case returns true when we don't have any additional base
restrict info is also confusing, as well as the degenerate_case name.

I'll update if I think of more interesting things to add.

--
Ronan Dunklau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-05-17 14:49:41 Re: PostgreSQL 15 Beta 1 release announcement draft
Previous Message Chirag Karkera 2022-05-17 14:07:34 Re: Provide read-only access to system catalog tables