Re: An inefficient query caused by unnecessary PlaceHolderVar

From: James Coleman <jtc331(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: An inefficient query caused by unnecessary PlaceHolderVar
Date: 2023-05-30 17:26:55
Message-ID: CAAaqYe9S1KxiYpX+RmbMMotSSwSZcHpuAnv9Hq+DZ8rkDacNtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 12, 2023 at 2:35 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
>
> I happened to notice that the query below can be inefficient.
>
> # explain (costs off)
> select * from
> int8_tbl a left join
> (int8_tbl b inner join
> lateral (select *, b.q2 as x from int8_tbl c) ss on b.q2 = ss.q1)
> on a.q1 = b.q1;
> QUERY PLAN
> ------------------------------------
> Hash Right Join
> Hash Cond: (b.q1 = a.q1)
> -> Nested Loop
> -> Seq Scan on int8_tbl b
> -> Seq Scan on int8_tbl c
> Filter: (b.q2 = q1)
> -> Hash
> -> Seq Scan on int8_tbl a
> (8 rows)
>
> For B/C join, currently we only have one option, i.e., nestloop with
> parameterized inner path. This could be extremely inefficient in some
> cases, such as when C does not have any indexes, or when B is very
> large. I believe the B/C join can actually be performed with hashjoin
> or mergejoin here, as it is an inner join.
>
> This happens because when we pull up the lateral subquery, we notice
> that Var 'x' is a lateral reference to 'b.q2' which is outside the
> subquery. So we wrap it in a PlaceHolderVar. This is necessary for
> correctness if it is a lateral reference from nullable side to
> non-nullable item. But here in this case, the referenced item is also
> nullable, so actually we can omit the PlaceHolderVar with no harm. The
> comment in pullup_replace_vars_callback() also explains this point.
>
> * (Even then, we could omit the PlaceHolderVar if
> * the referenced rel is under the same lowest outer join, but
> * it doesn't seem worth the trouble to check that.)

It's nice that someone already thought about this and left us this comment :)

> All such PHVs would imply lateral dependencies which would make us have
> no choice but nestloop. I think we should avoid such PHVs as much as
> possible. So IMO it may 'worth the trouble to check that'.
>
> Attached is a patch to check that for simple Vars. Maybe we can extend
> it to avoid PHVs for more complex expressions, but that requires some
> codes because for now we always wrap non-var expressions to PHVs in
> order to have a place to insert nulling bitmap. As a first step, let's
> do it for simple Vars only.
>
> Any thoughts?

This looks good to me.

A few small tweaks suggested to comment wording:

+-- lateral reference for simple Var can escape PlaceHolderVar if the
+-- referenced rel is under the same lowest nulling outer join
+explain (verbose, costs off)

I think this is clearer: "lateral references to simple Vars do not
need a PlaceHolderVar when the referenced rel is part of the same
lowest nulling outer join"?

* lateral references to something outside the subquery being
- * pulled up. (Even then, we could omit the PlaceHolderVar if
- * the referenced rel is under the same lowest outer join, but
- * it doesn't seem worth the trouble to check that.)
+ * pulled up. Even then, we could omit the PlaceHolderVar if
+ * the referenced rel is under the same lowest nulling outer
+ * join.

I think this is clearer: "references something outside the subquery
being pulled up and is not under the same lowest outer join."

One other thing: it would be helpful to have the test query output be
stable between HEAD and this patch; perhaps add:

order by 1, 2, 3, 4, 5, 6, 7

to ensure stability?

Thanks,
James Coleman

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Christensen 2023-05-30 17:35:37 Re: [PATCHES] Post-special page storage TDE support
Previous Message MARK CALLAGHAN 2023-05-30 17:03:32 Re: benchmark results comparing versions 15.2 and 16