RE: Why is there a Sort after an Index Only Scan?

From: André Hänsel <andre(at)webkr(dot)de>
To: "'Pgsql Performance'" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: RE: Why is there a Sort after an Index Only Scan?
Date: 2022-05-05 01:12:40
Message-ID: 015101d8601d$37df5190$a79df4b0$@webkr.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> They are sorted by order_id only within sets of the same shipping_date, which is not good enough.

Ah yes, that totally makes sense for the general case.

> so maybe you can just tweak that application to test if the start and end dates are the same and use equality when they are.

I definitely can.

But now I have a followup question, which probably should have been a separate question all along. I have modified the example a bit to have a more natural date distribution and I got rid of the weird shipping_date condition and actually made it different dates, so the index order is out of the picture. I also added some statistics so Postgres knows about the relationship between the columns.

https://dbfiddle.uk/?rdbms=postgres_14&fiddle=54c7774432e896e3c0e89d8084c4b194

After inserting more rows, Postgres still chooses a scan on the primary key instead of using the index.

Limit (cost=0.43..296.63 rows=50 width=4) (actual time=1052.692..1052.737 rows=50 loops=1)
-> Index Scan using orders_test_pkey on orders_test (cost=0.43..71149.43 rows=12010 width=4) (actual time=1052.690..1052.728 rows=50 loops=1)
Filter: ((shipping_date >= '2022-04-30'::date) AND (shipping_date <= '2022-05-01'::date))
Rows Removed by Filter: 1998734

By setting the CPU costs to 0 (last block in the fiddle) I can force the use of the previous plan and as I already suspected it is much better:

Limit (cost=101.00..101.00 rows=50 width=4) (actual time=4.835..4.843 rows=50 loops=1)
-> Sort (cost=101.00..101.00 rows=12010 width=4) (actual time=4.833..4.837 rows=50 loops=1)
Sort Key: order_id
Sort Method: top-N heapsort Memory: 27kB
-> Index Scan using orders_test_shipping_date_idx on orders_test (cost=0.00..101.00 rows=12010 width=4) (actual time=0.026..3.339 rows=11266 loops=1)
Index Cond: ((shipping_date >= '2022-04-30'::date) AND (shipping_date <= '2022-05-01'::date))

Is it overestimating the cost of the sorting?

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2022-05-05 02:08:59 Re: Why is there a Sort after an Index Only Scan?
Previous Message David Rowley 2022-05-04 23:42:08 Re: Why is there a Sort after an Index Only Scan?