Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, nikhil raj <nikhilraj474(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
Date: 2024-08-28 20:47:47
Message-ID: 3478841.1724878067@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> Exactly. What Tom's patch does is that if the expression contains
> Vars/PHVs that belong to the subquery, and does not contain any
> non-strict constructs, then it can escape being wrapped.

> In expression 't1.two+t2.two', 't2.two' is a Var that belongs to the
> subquery, and '+' is strict, so it can escape being wrapped.

> The expression 't1.two+1' does not meet these conditions, so it is
> wrapped into a PHV, and the PHV contains lateral reference to t1,
> which results in a nestloop join with a parameterized inner path.
> That's why Memoize can work in this query.

Yeah. (I'd missed that t1.two is a lateral reference and t2.two is
not; sorry for the noise.)

What happens as of HEAD is that, because we wrap this subquery output
in a PHV marked as due to be evaluated at t2, the entire clause

(t1.two+t2.two) = t2.unique1

becomes a base restriction clause for t2, so that when we generate
a path for t2 it will include this as a path qual (forcing the path
to be laterally dependent on t1). Without the PHV, it's just an
ordinary join clause and it will not be evaluated at scan level
unless it can be turned into an indexqual --- which it can't.

The preceding regression-test case with "t1.two+1 = t2.unique1"
can be made into a parameterized indexscan on t2.unique1, so it is,
and then memoize can trigger off that.

I'm inclined to think that treating such a clause as a join clause
is strictly better than what happens now, so I'm not going to
apologize for the PHV not being there. If you wanted to cast
blame, you could look to set_plain_rel_pathlist, where it says

* We don't support pushing join clauses into the quals of a seqscan, but
* it could still have required parameterization due to LATERAL refs in
* its tlist.

(This comment could stand some work, as it fails to note that
labeling the path with required parameterization can result in
"join clauses" being evaluated there anyway.)

In the normal course of things I'd be dubious about the value of
pushing join clauses into a seqscan, but maybe the possibility of a
memoize'd join has moved the goalposts enough that we should
consider that. Alternatively, maybe get_memoized_path should take
more responsibility for devising plausible subpaths rather than
assuming they'll be handed to it on a platter. (I don't remember
all the conditions checked in add_path, but I wonder if we are
missing some potential memoize applications because suitable paths
fail to survive the scan rel's add_path tournament.)

In the meantime, I think this test case is mighty artificial,
and it wouldn't bother me any to just take it out again for the
time being.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rob Sargent 2024-08-28 22:16:08 Re: Ghost data from failed FDW transactions?
Previous Message Jacob Biesinger 2024-08-28 16:18:33 Re: Ghost data from failed FDW transactions?

Browse pgsql-hackers by date

  From Date Subject
Next Message John H 2024-08-28 21:01:01 Re: Allow logical failover slots to wait on synchronous replication
Previous Message Robert Haas 2024-08-28 20:35:18 Re: allowing extensions to control planner behavior