Why does it sort rows after a nested loop that uses already-sorted indexes?

From: negora <public(at)negora(dot)com>
To: PostgreSQL - General <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Why does it sort rows after a nested loop that uses already-sorted indexes?
Date: 2024-04-18 11:00:38
Message-ID: 07b89cad-b2a1-48c2-8a14-67d1b93098c9@negora.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi:

I've a question regarding nested loops and the order in which they
return rows. Can you help me, please?

Suppose that I've two tables:

    - Table [sales_order]

        * Columns [id]
        * Index [sales_order_pkey] on [id]

    - Table [order_line]

        * Columns [id], [sales_order_id]
        * Index [order_line_ukey] on [sales_order_id], [id]

Then, I run the following query:

-----------------------------------------------------------

SELECT sales_order.id, order_line.id

FROM main.sales_order

JOIN main.order_line ON order_line.sales_order_id = sales_order.id

WHERE sales_order.customer_id = 2

ORDER BY sales_order.id, order_line.id;

-----------------------------------------------------------

The query planner decides to use the following nested loop:

-----------------------------------------------------------

Incremental Sort  (cost=26.90..16020.06 rows=144955 width=8)

  Sort Key: sales_order.id, order_line.id

  Presorted Key: sales_order.id

  ->  Nested Loop  (cost=0.70..4593.99 rows=144955 width=8)

        ->  Index Scan using sales_order_pkey on sales_order 
(cost=0.28..19.31 rows=79 width=4)

              Filter: (customer_id = 2)

        ->  Index Only Scan using order_line_ukey on order_line 
(cost=0.42..39.22 rows=1869 width=8)

              Index Cond: (sales_order_id = sales_order.id)

-----------------------------------------------------------

As you can see, the planner does detect that the outer loop returns the
rows presorted by [sales_order.id]. However, it's unable to detect that
the rows returned by the inner loop are also sorted by [sales_order.id]
first, and then by [order_line.id].

Why is it? Is it because the planner is designed to always ignore the
order of the inner loop, even although it could take advantage of it
(for example, because the analysis time rarely is worth it)? Or is there
something that I'm missing?

If I'm not mistaken, in this case both index scans seem to be done
serially, in an N x M style, so I think the row order would be
preserved, Right?

Thank you!

negora

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2024-04-18 14:53:37 Re: Why does it sort rows after a nested loop that uses already-sorted indexes?
Previous Message Justin 2024-04-17 19:17:10 Re: replication slot "pg_1015733_sync_1014718_7358407382484881476" does not exist