From: | André Hänsel <andre(at)webkr(dot)de> |
---|---|
To: | "'Pgsql Performance'" <pgsql-performance(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Why is there a Sort after an Index Only Scan? |
Date: | 2022-05-05 01:12:40 |
Message-ID: | 015101d8601d$37df5190$a79df4b0$@webkr.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
> They are sorted by order_id only within sets of the same shipping_date, which is not good enough.
Ah yes, that totally makes sense for the general case.
> so maybe you can just tweak that application to test if the start and end dates are the same and use equality when they are.
I definitely can.
But now I have a followup question, which probably should have been a separate question all along. I have modified the example a bit to have a more natural date distribution and I got rid of the weird shipping_date condition and actually made it different dates, so the index order is out of the picture. I also added some statistics so Postgres knows about the relationship between the columns.
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=54c7774432e896e3c0e89d8084c4b194
After inserting more rows, Postgres still chooses a scan on the primary key instead of using the index.
Limit (cost=0.43..296.63 rows=50 width=4) (actual time=1052.692..1052.737 rows=50 loops=1)
-> Index Scan using orders_test_pkey on orders_test (cost=0.43..71149.43 rows=12010 width=4) (actual time=1052.690..1052.728 rows=50 loops=1)
Filter: ((shipping_date >= '2022-04-30'::date) AND (shipping_date <= '2022-05-01'::date))
Rows Removed by Filter: 1998734
By setting the CPU costs to 0 (last block in the fiddle) I can force the use of the previous plan and as I already suspected it is much better:
Limit (cost=101.00..101.00 rows=50 width=4) (actual time=4.835..4.843 rows=50 loops=1)
-> Sort (cost=101.00..101.00 rows=12010 width=4) (actual time=4.833..4.837 rows=50 loops=1)
Sort Key: order_id
Sort Method: top-N heapsort Memory: 27kB
-> Index Scan using orders_test_shipping_date_idx on orders_test (cost=0.00..101.00 rows=12010 width=4) (actual time=0.026..3.339 rows=11266 loops=1)
Index Cond: ((shipping_date >= '2022-04-30'::date) AND (shipping_date <= '2022-05-01'::date))
Is it overestimating the cost of the sorting?
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-05-05 02:08:59 | Re: Why is there a Sort after an Index Only Scan? |
Previous Message | David Rowley | 2022-05-04 23:42:08 | Re: Why is there a Sort after an Index Only Scan? |