Quick Links

Re: Questions regarding Index AMs and natural ordering

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Questions regarding Index AMs and natural ordering
Date:	2023-11-24 16:44:42
Message-ID:	CAEze2WhY+aPtpLJZfXN+BC4FjojqBoy3JsxwKrkgT_vn5vv1kA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, 23 Nov 2023 at 19:52, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Thu, Nov 23, 2023 at 9:16 AM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > For example, btree ignores any ordering scan keys that it is given in
> > btrescan, which seems fine for btree because the ordering of a btree
> > is static (and no other order than that is expected apart from it's
> > reverse order), but this becomes problematic for other indexes that
> > could return ordered data but would prefer not to have to go through
> > the motions of making sure the return order is 100% correct, rather
> > than a k-sorted sequence, or just the matches to the quals (like
> > GIST). Is returning index scan results in (reverse) natural order not
> > optional but required with amcanorder? If it is required, why is the
> > am indicator called 'canorder' instead of 'willorder', 'doesorder' or
> > 'isordered'?
>
> I don't know. I have a hard time imagining an index AM that is
> amcanorder=true that isn't either nbtree, or something very similar
> (so similar that it seems unlikely that anybody would actually go to
> the trouble of implementing it from scratch).

Well, BRIN (with minmax opclasses) could return ordered results if it
needs to (see [0]; though that implements it as a distinct plan node).
Ordering the tuples correctly takes quite some effort, but is quite
likely to use less effort and/or scratch space than a table/bitmap
scan + sort, because we won't have to manage all tuples of the table
at the same time. However, it woould be extremely expensive if the
planner expects this to always return the data in btree order.

For GIST with the btree_gist opclasses it is even easier to return
ordered results (patch over at [1]), but then still it prefers not to
have to make a strict ordering as it adds overhead vs 'normal' index
scans.

Also, was that a confirmation that amcanorder is a requirement for the
AM to return data in index order (unless amrescan's orderbys is not
null), or just a comment on the reason for the name of 'amcanorder'
being unclear?

> You didn't mention support for merge joins. That's one of the defining
> characteristics of an amcanorder=true index AM, since an
> "ammarkpos/amrestrpos function need only be provided if the access
> method supports ordered scans". It's hard to imagine how that could
> work with a loosely ordered index. It seems to imply that the scan
> must always work with a simple linear order.

I probably didn't think of merge join support because 'merge join' is
not mentioned as such in the index AM api - I knew of
ammarkpos/amrestrpos, but hadn't yet gone into detail what they're
used for.

> Cases where the planner uses a merge join often involve an index path
> with an "interesting sort order" that "enables" the merge join.
> Perhaps most of the alternative plans (that were almost as cheap as
> the merge join plan) would have had to scan the same index in the same
> way anyway, so it ends up making sense to use a merge join. The merge
> join can get ordered results from the index "at no extra cost". (All
> of this is implicit, of course -- the actual reason why the planner
> chose the merge join plan is because it worked out to be the cheapest
> plan.)

Couldn't the merge join (or scan node) use a tuple store to return to
some earlier point in the index scan when a native version of markpos
is not easily supported? It would add (potentially very significant)
IO overhead, but it would also allow merge joins on ordered paths that
currently don't have a natural way of marking their position.

> > Alternatively, if an am should be using the order scan keys from
> > .amrescan and natural order scans also get scan keys, is there some
> > place I find the selection process for ordered index AMs, and how this
> > ordering should be interepreted? There are no good examples available
> > in core code because btree is special-cased, and there are no other
> > in-tree AMs that have facilities where both `amcanorderbyop` and
> > `amcanorder` are set.
>
> The general notion of a data type's sort order comes from its default
> btree operator class, so the whole idea of a generic sort order is
> deeply tied to the nbtree AM. That's why we sometimes have btree
> operator classes for types that you'd never actually want to index (at
> least not using a btree index).

Yes, the part where btree opclasses determine a type's ordering is
clear. But what I'm looking for is "how do I, as an index AM
implementation, get the signal that I need to return column-ordered
data?" If that signal is "index AM marked amcanorder == index must
return ordered data", then that's suboptimal for the index AM writer,
but understandable. If amcanorder does not imply always ordered
retrieval, then I'd like to know how it is signaled to the AM. But as
of yet I've not found a clear and conclusive answer either way.

Kind regards,

Mattthias van de Meent.

[0] https://www.postgresql.org/message-id/flat/e70fa091-e338-1598-9de4-6d0ef6b693e2%40enterprisedb.com
[1] https://www.postgresql.org/message-id/flat/EB2AF704-70FC-4D73-A97A-A7884A0381B5%40kleczek.org

In response to

Re: Questions regarding Index AMs and natural ordering at 2023-11-23 18:51:51 from Peter Geoghegan

Responses

Re: Questions regarding Index AMs and natural ordering at 2023-11-24 18:12:35 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Magnus Hagander	2023-11-24 17:20:28	Re: [HACKERS] pg_upgrade vs vacuum_cost_delay
Previous Message	Bruce Momjian	2023-11-24 16:34:32	Re: [HACKERS] pg_upgrade vs vacuum_cost_delay