Re: POC, WIP: OR-clause support for indexes

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: "Finnerty, Jim" <jfinnert(at)amazon(dot)com>
Cc: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "teodor(at)sigaev(dot)ru" <teodor(at)sigaev(dot)ru>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2023-08-01 21:16:09
Message-ID: CAH2-Wz=9N_4+EyhtyFqYQRx4OgVbP+1aoYU2JQPVogCir61ZEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jim,

On Tue, Aug 1, 2023 at 1:11 PM Finnerty, Jim <jfinnert(at)amazon(dot)com> wrote:
> Peter, I'm very glad to hear that you're researching this!

Glad to hear it!

> Will this include skip-scan optimizations for OR or IN predicates, or when the number of distinct values in a leading non-constant index column(s) is sufficiently small?

Yes -- though perhaps not in the first iteration.

As I go into on the thread associated with my own patch [1], my
initial goal is to support efficient execution of multiple IN() lists
for multiple columns from the same index, all while preserving index
sort order on output, and avoiding a separate duplicate elimination
step. Some of the most compelling cases for these MDAM techniques
involve GroupAggregates, ORDER BY ... LIMIT, and DISTINCT -- I
understand the importance of making the index scan appear to be a
conventional index scan to the optimizer.

> If you have an ORDER BY clause and a lower and upper bound on the first column of the ORDER BY list, you have a potential to reduce search effort versus a full index scan, even when that upper and lower bound needs to be derived from a complex predicate.

It sounds like your example is an attempt to ascertain whether or not
my design considers the need to convert complicated predicates into
disjuncts that can be executed as if by one single index scan, via CNF
-> DNF transformations/preprocessing. That is certainly the plan, at
least medium term -- I fully expect to be able to combine all of these
techniques together, in ways that continue to work even with very
complicated predicates. Like the really hairy example from the MDAM
paper, or like your example.

There are already some nbtree scan key preprocessing steps a little
like the ones considered by the MDAM paper. These steps eliminate
redundant and contradictory quals -- but they weren't specifically
written with the very general MDAM DNF design requirements in mind.
Plus there are already at least some transformations like the one that
Alena is working on in the patch discussed on this thread -- these
were also not written with MDAM stuff in mind.

A major goal of mine for this project in the short term is to come up
with a very general design. I must reconcile all this stuff, somehow
or other, so that these very complicated cases will work just as well
as simpler and more obvious cases. I really hate special cases.

> Of course, if you have an IN list you can either skip to the distinct values listed or scan the entire index, depending on estimated cost.

Actually, I think that it should be possible to decide on how to skip
dynamically, without needing an up-front decision around skipping from
the optimizer. In other words, the scans can skip using an adaptive
strategy. This is feasible provided I can make the overhead of a
dynamic/adaptive approach negligible. When it turns out that a full
index scan is appropriate, we'll just end up doing it that way at
runtime.

Nothing stops a given scan from needing to do skip a great deal in the
first half of an index, while scanning everything in the second half
of the index. Obviously, a static choice won't do well there, since it
works at the level of the whole scan/index, which seems like the wrong
framing to me. (Of course we'll still need to model skipping stuff in
the planner -- just not so that we can decide between two index paths
that are essentially identical, that should just be one index path.)

[1] https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-08-01 21:19:45 Re: stats test intermittent failure
Previous Message Daniel Gustafsson 2023-08-01 21:03:03 Re: bug: ANALYZE progress report with inheritance tables