Efficient pagination using multi-column cursors

From: large(dot)goose2829(at)salomvary(dot)com
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Efficient pagination using multi-column cursors
Date: 2025-02-26 14:27:53
Message-ID: a60186fe-b552-42e3-824e-27dbdd312f15@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi folks,

I am working on optimizing a query that attempts to efficiently paginate through a large table using multi-column "cursors" aka. the "seek method" (as described in detail here: https://use-the-index-luke.com/sql/partial-results/fetch-next-page)

The table (drastically simplified) looks like this:

CREATE TABLE data
(
col_1 int NOT NULL,
col_2 int NOT NULL,
col_3 int NOT NULL,
content varchar(10) NOT NULL
);

And has an appropriate index:

CREATE INDEX data_index ON data (col_1, col_2, col_3);

The recommended query to paginate through this table is using the "row values" syntax:

SELECT content
FROM data
WHERE (col_1, col_2, col_3) > (10, 20, 29)
ORDER BY col_1, col_2, col_3
LIMIT 100;

Which results in a perfectly optimized query plan (slightly edited for readability, using test data of 1 million rows, on PostgreSQL 17.2):

Limit (cost=0.42..5.33 rows=100 width=20)
(actual time=0.084..0.197 rows=100 loops=1)
-> Index Scan using data_index on data
(cost=0.42..43617.30 rows=889600 width=20)
(actual time=0.081..0.176 rows=100 loops=1)
Index Cond: (ROW(col_1, col_2, col_3) > ROW(10, 20, 29))
Planning Time: 0.344 ms
Execution Time: 0.264 ms

However, in reality, my query uses a mix of ascending and descending ordering (with an index matching the order columns), in which case the WHERE (col_1, col_2, col_3) > (10, 20, 29) syntax is not an option (unless I somehow derive "reversed" data from the column, which I would like to avoid).

Therefore I am using an equivalent query using multiple WHERE conditions, something like this (for simplicity, no mixed ordering is involved in the examples):

SELECT content
FROM data
WHERE
col_1 >= 10
AND (
col_1 > 10
OR (
col_2 >= 20
AND (
col_2 > 20
OR col_3 > 29
)
)
)
ORDER BY col_1, col_2, col_3
LIMIT 100;

Which returns the same rows, but the query plan is slightly less efficient:

Limit (cost=0.42..6.48 rows=100 width=20)
(actual time=0.848..0.893 rows=100 loops=1)
-> Index Scan using data_index on data
(cost=0.42..52984.52 rows=874874 width=20)
(actual time=0.847..0.884 rows=100 loops=1)
Index Cond: (col_1 >= 10)
Filter: ((col_1 > 10) OR ((col_2 >= 20) AND ((col_2 > 20) OR (col_3 > 29))))
Rows Removed by Filter: 2030
Planning Time: 0.133 ms
Execution Time: 0.916 ms

Instead of exclusively relying on an index access predicate, this plan involves an index filter predicate.

Without being familiar the internals of the query planner, I *think* there *should* be a way to come up with WHERE conditions that results in the "perfect" plan. There are several ways to phrase the conditions, of which I've tried a few, only to get the same or worse performance. Does anyone have a suggestion for a better query?

I am also aware that I might be chasing an optimization with low returns (yet to see how it performs in Real Life data) but I'm already too deep down in a rabbit hole without being able to turn back without knowing The Truth.

I've dumped my experiments into a DB Fiddle: https://www.db-fiddle.com/f/kd8zaibZGKGH1HSStyxNkx/0

Cheers,
Márton

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Peter Geoghegan 2025-02-26 15:05:52 Re: Efficient pagination using multi-column cursors
Previous Message Lincoln Swaine-Moore 2025-02-25 02:17:08 Re: Unfortunate Nested Loop + Missing Autovacuum