Quick Links

Re: Efficient pagination using multi-column cursors

From:	large(dot)goose2829(at)salomvary(dot)com
To:	"Peter Geoghegan" <pg(at)bowt(dot)ie>
Cc:	pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Re: Efficient pagination using multi-column cursors
Date:	2025-02-26 15:40:35
Message-ID:	32308394-e197-4a9f-9dbe-336e8af6da58@app.fastmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Thanks for the insights!

On Wed, Feb 26, 2025, at 16:05, Peter Geoghegan wrote:
> On Wed, Feb 26, 2025 at 9:29 AM <large(dot)goose2829(at)salomvary(dot)com> wrote:
> > Without being familiar the internals of the query planner, I *think* there *should* be a way to come up with WHERE conditions that results in the "perfect" plan.
>
> There is a fundamental trade-off involved here. The simple, fast
> "WHERE (col_1, col_2, col_3) > (10, 20, 29)" query returns whatever
> tuples are stored immediately after "(10, 20, 29)" in the index.
> Naturally, they're returned in index order, which is usually the most
> useful order (simple ASC order or simple DESC order for all columns).
>
>
> The B-Tree code can physically traverse your mixed-ASC-and-DESC order
> index in almost the same way. But it is much less useful, since the
> matching index tuples won't be physically located together as exactly
> one contiguous group of tuples.

I am not sure I understand this.

My understanding is that given this "mixed order" index:
CREATE INDEX data_index_desc ON data (col_1, col_2 DESC, col_3);

The index tuples are physically organized exactly in this way:
ORDER BY col_1, col_2 DESC, col_3

So that I should be able to write a query that reads a continuous range from this index without filtering.

>
> And so (with your "handwritten" row
> comparison) you get a filter qual that filters out non-matching tuples
> using lower-order index columns. The index scan actually just returns
> "Index Cond: (col_1 >= 10)" (which still returns a contiguous group of
> index tuples), while a filter condition excludes those tuples returned
> by the index scan node that don't satisfy the later/lower-order column
> condition.

Does this mean that it is not possible to come up with a plan that has the same performance as "WHERE (col_1, col_2, col_3) > (10, 20, 29)" using "handwritten" filters, or only for "mixed order"? Or not a theoretical limitation but a limitation of the current implementation of the query planner?

I don't know whether it's polite to bring up competitors on this mailing list, but MySQL 8 (which quite ironically has poor performance for the "row values" syntax, doing a full index scan) seems to avoid index filtering using the "OR AND" variant (at least when no mixed ordering is involved):

SELECT content
FROM data
WHERE
col_1 > 10
OR (
col_1 = 10
AND (
col_2 > 20
OR (
col_2 = 20
AND col_3 > 29
)
)
)
ORDER BY col_1, col_2, col_3
LIMIT 100;

-> Limit: 100 row(s) (cost=100710 rows=100) (actual time=0.0322..1.15 rows=100 loops=1)
-> Index range scan on data using data_index
over (col_1 = 10 AND col_2 = 20 AND 29 < col_3) OR (col_1 = 10 AND 20 < col_2) OR (10 < col_1),
with index condition: ((`data`.col_1 > 10) or ((`data`.col_1 = 10) and ((`data`.col_2 > 20) or ((`data`.col_2 = 20) and (`data`.col_3 > 29)))))
(cost=100710 rows=511937) (actual time=0.0316..1.14 rows=100 loops=1)

Still slightly slower actual total time on the same machine as PostgreSQL though (based on a single EXPLAIN ANALYZE sample only).

>
> The book "Relational Database Index Design and the Optimizers"
> proposes a vocabulary for the trade-offs in this area -- the 3 star
> model. When creating the best possible index for certain queries it is
> sometimes inherently necessary to choose between what it calls the
> first star (which means avoiding a sort) and the second star (which
> means having the thinnest possible "row slice"). Sometimes those
> things are in tension, which makes sense when you imagine how the
> index must be physically traversed.

Aka. "Good, Fast, Cheap — Pick Any Two" ;)

Cheers,
Márton

In response to

Re: Efficient pagination using multi-column cursors at 2025-02-26 15:05:52 from Peter Geoghegan

Responses

Re: Efficient pagination using multi-column cursors at 2025-02-26 16:14:21 from Peter Geoghegan

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Peter Geoghegan	2025-02-26 16:14:21	Re: Efficient pagination using multi-column cursors
Previous Message	Peter Geoghegan	2025-02-26 15:05:52	Re: Efficient pagination using multi-column cursors