Re: Questions regarding Index AMs and natural ordering

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Questions regarding Index AMs and natural ordering
Date: 2023-11-23 19:45:07
Message-ID: CAH2-WzkDrpWGRMdJo2MKbt4LjOTn5HOvm4h2riNw9_hRpVTrOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 23, 2023 at 11:15 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Agreed on that, but I don't have that hard a time imagining cases
> where it might be useful for btree not to guarantee ordered output.
> IIRC, it currently has to do extra pushups to ensure that behavior
> in ScalarArrayOp cases. We've not bothered to expand the planner
> infrastructure to distinguish "could, but doesn't" paths for btree
> scans, but that's more about it not being a priority than because it
> wouldn't make sense.

I'm glad that you brought that up, because it's an important issue for
my ScalarArrayOp patch (which Matthias referenced). My patch simply
removes this restriction from the planner -- now ScalarArrayOp clauses
aren't treated as a special case when generating path keys. This works
in all cases because the patch generalizes nbtree's approach to native
ScalarArrayOp execution, allowing index scans to work as one
continuous index scan in all cases.

As you might recall, the test case that exercises the issue is:

SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;

It doesn't actually make much sense to execute this as two primitive
index scans, though. The most efficient approach is to perform a
single index descent, while still being able to use a true index qual
for "tenthous" (since using a filter qual is so much slower due to the
overhead of accessing the heap just to filter out non-matching
tuples). That's what the patch does.

This would be true even without the "ORDER BY" -- accessing the leaf
page once is faster than accessing it twice (same goes for the root).
So I see no principled reason why we'd ever really want to start a
primitive index scan that wasn't "anchored to an equality constraint".
This is reliably faster, while also preserving index sort order,
almost as a side-effect.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-11-23 22:29:26 Re: Properly pathify the union planner
Previous Message Pavel Stehule 2023-11-23 19:39:38 Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]