Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, benoit <benoit(at)hopsandfork(dot)com>
Subject: Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Date: 2023-07-27 04:13:47
Message-ID: CAH2-Wz=6vQyS5YhkZJV1gXjEF=TFEh2gxB0kjxDuZLgbC98KOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 26, 2023 at 12:07 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> We could cache the last accessed leaf page across amrescan operations
> to reduce the number of index traversals needed when the join key of
> the left side is highly (but not necessarily strictly) correllated.

That sounds like block nested loop join. It's possible that that could
reuse some infrastructure from this patch, but I'm not sure.

In general, SAOP execution/MDAM performs "duplicate elimination before
it reads the data" by sorting and deduplicating the arrays up front.
While my patch sometimes elides a primitive index scan, primitive
index scans are already disjuncts that are combined to create what can
be considered one big index scan (that's how the planner and executor
think of them). The patch takes that one step further by recognizing
that it could quite literally be one big index scan in some cases (or
fewer, larger scans, at least). It's a natural incremental
improvement, as opposed to inventing a new kind of index scan. If
anything the patch makes SAOP execution more similar to traditional
index scans, especially when costing them.

Like InnoDB style loose index scan (for DISTINCT and GROUP BY
optimization), block nested loop join would require inventing a new
type of index scan. Both of these other two optimizations involve the
use of semantic information that spans multiple levels of abstraction.
Loose scan requires duplicate elimination (that's the whole point),
while IIUC block nested loop join needs to "simulate multiple inner
index scans" by deliberately returning duplicates for each would-be
inner index scan. These are specialized things.

To be clear, I think that all of these ideas are reasonable. I just
find it useful to classify these sorts of techniques according to
whether or not the index AM API would have to change or not, and the
general nature of any required changes. MDAM can do a lot of cool
things without requiring any revisions to the index AM API, which
should allow it to play nice with everything else (index path clause
safety issues notwithstanding).

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Garrett Thornburg 2023-07-27 04:33:55 Re: PATCH: Add REINDEX tag to event triggers
Previous Message Richard Guo 2023-07-27 02:55:07 Re: Retiring is_pushed_down