Re: Index filter instead of index condition w/ IN / ANY queries above certain set size

From: Danny Shemesh <dany74q(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-general(at)postgresql(dot)org
Subject: Re: Index filter instead of index condition w/ IN / ANY queries above certain set size
Date: 2022-11-23 19:23:49
Message-ID: CAFZC=Qqn+v-EjPszVP5y7XMowhjJyKYpy36S9akO1i4zJyh_-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hey Laurenz, Tom - thanks again !

> that it is cheaper to use the index that supports the ORDER BY
Thing is, that both queries use the exact same index (idx_hashes), but one
uses it w/ the filter and one does not.

> This doesn't match up terribly well with the table definition you showed
before
Yeah.. it was a bit hard to reproduce exactly, but the fiddle does showcase
that there's some threshold to the ANY set-size
where it stops using the column in the index condition, and moves it to the
filter step - I thought it might originate
from similar reasons.

> but I wonder whether tidh is a low-order index column.
The index indeed uses tidh as a low order column, and it's better to have
it the other way around -
currently, it's: (tid, pidh, tidh) - where (tid, tidh, pidh) would've
probably worked better.

We've already optimized the query itself - but for pure understanding of
the planner decision here,
I'd really still like to understand, if possible, the difference between
ANY and IN,
and why, even though the column order isn't optimal - one plan still
successfully uses the index more efficiently than another.

Any idea where I could zone-in in the source code to look for hints, maybe ?

Appreciate it !
Danny

On Wed, Nov 23, 2022 at 4:29 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Danny Shemesh <dany74q(at)gmail(dot)com> writes:
> > -> Index Only Scan using
> > idx_hashes on refs (cost=0.56..722735.47 rows=33715 width=16) (actual
> > time=1727.208..1727.208 rows=1 loops=1)
> > Index Cond: (tid =
> > '13371337-1337-1337-1337-133713371337'::uuid)
> > * Filter: (tidh = ANY
> > ('{13391339-1339-1339-1339-133913391339}'::uuid[])) <<<<<<<<<<<<<<<-
> > Note this line* Rows Removed
> > by Filter: 109087
> > Heap Fetches: 16976
> > Buffers: shared hit=13051
> > read=14561
> > I/O Timings: read=53405.294
>
> This doesn't match up terribly well with the table definition
> you showed before, but I wonder whether tidh is a low-order
> index column. If you need to optimize this specific shape
> of query you need to pay attention to the index column order, per
>
> https://www.postgresql.org/docs/current/indexes-multicolumn.html
>
> That is, tid and tidh need to be the first two index columns.
>
> regards, tom lane
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ron 2022-11-23 19:24:15 Re: table inheritance partition and indexes
Previous Message Simon Riggs 2022-11-23 19:11:52 Re: MERGE RETURNING