From: | large(dot)goose2829(at)salomvary(dot)com |
---|---|
To: | "Peter Geoghegan" <pg(at)bowt(dot)ie> |
Cc: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Efficient pagination using multi-column cursors |
Date: | 2025-02-26 15:40:35 |
Message-ID: | 32308394-e197-4a9f-9dbe-336e8af6da58@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Thanks for the insights!
On Wed, Feb 26, 2025, at 16:05, Peter Geoghegan wrote:
> On Wed, Feb 26, 2025 at 9:29 AM <large(dot)goose2829(at)salomvary(dot)com> wrote:
> > Without being familiar the internals of the query planner, I *think* there *should* be a way to come up with WHERE conditions that results in the "perfect" plan.
>
> There is a fundamental trade-off involved here. The simple, fast
> "WHERE (col_1, col_2, col_3) > (10, 20, 29)" query returns whatever
> tuples are stored immediately after "(10, 20, 29)" in the index.
> Naturally, they're returned in index order, which is usually the most
> useful order (simple ASC order or simple DESC order for all columns).
>
>
> The B-Tree code can physically traverse your mixed-ASC-and-DESC order
> index in almost the same way. But it is much less useful, since the
> matching index tuples won't be physically located together as exactly
> one contiguous group of tuples.
I am not sure I understand this.
My understanding is that given this "mixed order" index:
CREATE INDEX data_index_desc ON data (col_1, col_2 DESC, col_3);
The index tuples are physically organized exactly in this way:
ORDER BY col_1, col_2 DESC, col_3
So that I should be able to write a query that reads a continuous range from this index without filtering.
>
> And so (with your "handwritten" row
> comparison) you get a filter qual that filters out non-matching tuples
> using lower-order index columns. The index scan actually just returns
> "Index Cond: (col_1 >= 10)" (which still returns a contiguous group of
> index tuples), while a filter condition excludes those tuples returned
> by the index scan node that don't satisfy the later/lower-order column
> condition.
Does this mean that it is not possible to come up with a plan that has the same performance as "WHERE (col_1, col_2, col_3) > (10, 20, 29)" using "handwritten" filters, or only for "mixed order"? Or not a theoretical limitation but a limitation of the current implementation of the query planner?
I don't know whether it's polite to bring up competitors on this mailing list, but MySQL 8 (which quite ironically has poor performance for the "row values" syntax, doing a full index scan) seems to avoid index filtering using the "OR AND" variant (at least when no mixed ordering is involved):
SELECT content
FROM data
WHERE
col_1 > 10
OR (
col_1 = 10
AND (
col_2 > 20
OR (
col_2 = 20
AND col_3 > 29
)
)
)
ORDER BY col_1, col_2, col_3
LIMIT 100;
-> Limit: 100 row(s) (cost=100710 rows=100) (actual time=0.0322..1.15 rows=100 loops=1)
-> Index range scan on data using data_index
over (col_1 = 10 AND col_2 = 20 AND 29 < col_3) OR (col_1 = 10 AND 20 < col_2) OR (10 < col_1),
with index condition: ((`data`.col_1 > 10) or ((`data`.col_1 = 10) and ((`data`.col_2 > 20) or ((`data`.col_2 = 20) and (`data`.col_3 > 29)))))
(cost=100710 rows=511937) (actual time=0.0316..1.14 rows=100 loops=1)
Still slightly slower actual total time on the same machine as PostgreSQL though (based on a single EXPLAIN ANALYZE sample only).
>
> The book "Relational Database Index Design and the Optimizers"
> proposes a vocabulary for the trade-offs in this area -- the 3 star
> model. When creating the best possible index for certain queries it is
> sometimes inherently necessary to choose between what it calls the
> first star (which means avoiding a sort) and the second star (which
> means having the thinnest possible "row slice"). Sometimes those
> things are in tension, which makes sense when you imagine how the
> index must be physically traversed.
Aka. "Good, Fast, Cheap — Pick Any Two" ;)
Cheers,
Márton
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2025-02-26 16:14:21 | Re: Efficient pagination using multi-column cursors |
Previous Message | Peter Geoghegan | 2025-02-26 15:05:52 | Re: Efficient pagination using multi-column cursors |