Re: Can JoinFilter condition be pushed down into IndexScan?

From: Bəxtiyar Neyman <bakhtiyarneyman(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Can JoinFilter condition be pushed down into IndexScan?
Date: 2023-06-21 22:58:23
Message-ID: CAObnsGqNBu3FY9-C2Z-CHT8RSbuTx7k+q=1ZWy5PADEo9qYd1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks, Tomas!

> I know, but it makes them harder to read for people. If you want people
> to respond it's generally a good idea to make it easy to understand the
> question. Don't make them waste their time - they'll just skip the
> message entirely.

Fair point.

> So, the optimizer clearly believes the subquery case has cost 9.15,
> while the inner join case costs 2.15. So it believes the plan is
> "cheaper" than the subquery. So even if it knew how to do the
> transformation / build the other plan (which I'm not sure it can), it
> probably wouldn't do it.

> AFAICS there's no chance to make this bit smarter until the estimates
> get much better to reality.

Got it. Thanks. I guess we'll have to emit correlated subqueries/CTEs.

Sincerely,
Bakhtiyar

On Wed, Jun 21, 2023 at 12:58 PM Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
wrote:

> On 6/21/23 20:37, Bəxtiyar Neyman wrote:
> > Thanks Tomas for the lengthy write-up!
> >
> > Pardon the noise in the queries (LATERAL, AND true etc): they were
> > autogenerated by the library we wrote.
> >
>
> I know, but it makes them harder to read for people. If you want people
> to respond it's generally a good idea to make it easy to understand the
> question. Don't make them waste their time - they'll just skip the
> message entirely.
>
> >> Because those queries are not doing the same thing. In the first query
> >> you sort by t3_0 columns, while the "id = 4732455" condition is on the
> >> other table. And so it can't use the index scan for sorting.
> >>
> >> While in the second query it can do that, and it doesn't need to do the
> >> explicit sort (which needs to fetch all the rows etc.).
> >
> > Let me try to explain what both of my queries do:
> > 1) Get the rank of the user using its id (id = 4732455 in this example,
> > but it could have been one that exists, e.g. id = 500). This is LATERAL
> > t3_1 in the first query and subquery in the WHERE clause of the second
> > query.
> > 2) Using that rank, get the next 10 users by rank. This is t3_0.
> >
> > Thus I can't just change the first query to "ORDER BY t3_1."rank" DESC,
> > t3_1."id" DESC" as you suggest, because then the order of returned rows
> > will not be guaranteed. In fact, such a clause will have no effect
> > because there is going to be at most one row supplied by t3_1 anyway.
> >
>
> Ah, OK. I got this wrong.
>
> > My question thus still stands. The planner knows that t3_1 has at most
> > one row, and it knows that t3_0 can produce up to 5000 rows. Yet, it
> > doesn't figure out that it could have lowered the Join Filter condition
> > from the first plan as an Index Cond of the Index Scan of t3_1. Is there
> > a fundamental reason for this, or is this something worth improving in
> > the planner?
> >
>
> As I tried to explain before, I don't think the problem is in the
> planner not being able to do this transformation, but more likely in not
> being able to cost it correctly.
>
> Consider this (with 1M rows in the user_ranks table):
>
> 1) subquery case
> =================
>
> Limit (cost=8.87..9.15 rows=10 width=8) (actual time=0.032..0.037
> rows=10 loops=1)
> Output: t3_0.id, t3_0.rank
> InitPlan 1 (returns $0,$1)
> -> Index Scan using user_ranks_pkey on public.user_ranks t4_0
> (cost=0.42..8.44 rows=1 width=8) (actual time=0.017..0.019 rows=1 loops=1)
> Output: t4_0.rank, t4_0.id
> Index Cond: (t4_0.id = 333333)
> -> Index Only Scan Backward using "by (rank, id)" on
> public.user_ranks t3_0 (cost=0.42..9493.75 rows=333333 width=8) (actual
> time=0.031..0.033 rows=10 loops=1)
> Output: t3_0.id, t3_0.rank
> Index Cond: (ROW(t3_0.rank, t3_0.id) <= ROW($0, $1))
> Heap Fetches: 0
> Planning Time: 0.072 ms
> Execution Time: 0.055 ms
> (12 rows)
>
>
> 2) join
> =======
>
> Limit (cost=0.85..2.15 rows=10 width=8) (actual time=464.662..464.672
> rows=10 loops=1)
> Output: t3_0.id, t3_0.rank
> -> Nested Loop (cost=0.85..43488.87 rows=333333 width=8) (actual
> time=464.660..464.667 rows=10 loops=1)
> Output: t3_0.id, t3_0.rank
> Inner Unique: true
> Join Filter: (ROW(t3_0.rank, t3_0.id) <= ROW(t4_0.rank, t4_0.id))
> Rows Removed by Join Filter: 666667
> -> Index Only Scan Backward using "by (rank, id)" on
> public.user_ranks t3_0 (cost=0.42..25980.42 rows=1000000 width=8)
> (actual time=0.015..93.703 rows=666677 loops=1)
> Output: t3_0.rank, t3_0.id
> Heap Fetches: 0
> -> Materialize (cost=0.42..8.45 rows=1 width=8) (actual
> time=0.000..0.000 rows=1 loops=666677)
> Output: t4_0.rank, t4_0.id
> -> Index Scan using user_ranks_pkey on public.user_ranks
> t4_0 (cost=0.42..8.44 rows=1 width=8) (actual time=0.010..0.011 rows=1
> loops=1)
> Output: t4_0.rank, t4_0.id
> Index Cond: (t4_0.id = 333333)
> Planning Time: 0.092 ms
> Execution Time: 464.696 ms
> (17 rows)
>
>
> 3) join (with LEFT JOIN)
> ========================
>
> Limit (cost=20038.73..20038.76 rows=10 width=8) (actual
> time=180.714..180.720 rows=10 loops=1)
> Output: t3_0.id, t3_0.rank
> -> Sort (cost=20038.73..20872.06 rows=333333 width=8) (actual
> time=180.712..180.715 rows=10 loops=1)
> Output: t3_0.id, t3_0.rank
> Sort Key: t3_0.rank DESC, t3_0.id DESC
> Sort Method: top-N heapsort Memory: 26kB
> -> Nested Loop Left Join (cost=0.85..12835.52 rows=333333
> width=8) (actual time=0.033..122.000 rows=333333 loops=1)
> Output: t3_0.id, t3_0.rank
> -> Index Scan using user_ranks_pkey on public.user_ranks
> t4_0 (cost=0.42..8.44 rows=1 width=8) (actual time=0.018..0.020 rows=1
> loops=1)
> Output: t4_0.id, t4_0.rank
> Index Cond: (t4_0.id = 333333)
> -> Index Only Scan using "by (rank, id)" on
> public.user_ranks t3_0 (cost=0.42..9493.75 rows=333333 width=8) (actual
> time=0.013..49.759 rows=333333 loops=1)
> Output: t3_0.rank, t3_0.id
> Index Cond: (ROW(t3_0.rank, t3_0.id) <=
> ROW(t4_0.rank, t4_0.id))
> Heap Fetches: 0
> Planning Time: 0.087 ms
> Execution Time: 180.744 ms
> (17 rows)
>
>
> So, the optimizer clearly believes the subquery case has cost 9.15,
> while the inner join case costs 2.15. So it believes the plan is
> "cheaper" than the subquery. So even if it knew how to do the
> transformation / build the other plan (which I'm not sure it can), it
> probably wouldn't do it.
>
> OTOH if you rewrite it to a left join, it costs 20038.76 - way more than
> the inner join, but it's actually 2x faster.
>
>
> AFAICS there's no chance to make this bit smarter until the estimates
> get much better to reality.
>
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2023-06-21 23:08:19 Re: [PATCH] doc: add missing mention of MERGE in MVCC
Previous Message Andres Freund 2023-06-21 22:12:08 vac_truncate_clog()'s bogus check leads to bogusness