Re: A limit clause can cause a poor index choice

From: Mohamed Wael Khobalatte <mkhobalatte(at)grubhub(dot)com>
To: Nick Cleaton <nick(at)cleaton(dot)net>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: A limit clause can cause a poor index choice
Date: 2020-05-19 20:56:43
Message-ID: CABZeWdxy7Ryp-ETb4gm+o0B4X38e=RVkdMypWNDAybEo6uwGPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Nick,

I believe a second ordering, by id desc, will get your query to use the
right index, and shouldn't be functionally different from what you would
expect.

```
select * from test_orders where

customer_id=any(ARRAY[9993,9997,9912,9954,9100,9101,9102,9234,9500,9512])

order by o_date desc, id desc limit 10;
```

I didn't look closely as to why from your data though. I'll leave it to
more experienced people to comment as to why the planner misjudged your
query badly. What happens when you raise the limit? Say to a 1000?

On Tue, May 19, 2020 at 3:00 PM Nick Cleaton <nick(at)cleaton(dot)net> wrote:

> The attached script builds a 10G test table which demonstrates a
> problem that we have in production with postgresql 12.3-1.pgdg18.04+1
> on ubuntu linux. Indexes:
>
> test_orders_o_date_idx btree(o_date)
> test_orders_customer_id_o_date_idx btree(customer_id, o_date)
>
> We query for the most recent orders for sets of customers, and
> sometimes none of those customers have any orders and the results are
> empty:
>
> explain analyze select * from test_orders where
> customer_id=any(ARRAY[9993,9997,9912,9954,9100,9101,9102,9234,9500,9512])
> order by o_date desc;
>
> QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
> Sort (cost=24848.96..24870.67 rows=8686 width=1839) (actual
> time=1.101..1.102 rows=0 loops=1)
> Sort Key: o_date DESC
> Sort Method: quicksort Memory: 25kB
> -> Index Scan using test_orders_customer_id_o_date_idx on
> test_orders (cost=0.43..17361.20 rows=8686 width=1839) (actual
> time=1.047..1.047 rows=0 loops=1)
> Index Cond: (customer_id = ANY
> ('{9993,9997,9912,9954,9100,9101,9102,9234,9500,9512}'::integer[]))
> Planning Time: 3.821 ms
> Execution Time: 1.174 ms
> (7 rows)
>
> So far so good. But if we add a limit clause to the query then the
> plan goes very wrong:
>
> explain analyze select * from test_orders where
> customer_id=any(ARRAY[9993,9997,9912,9954,9100,9101,9102,9234,9500,9512])
> order by o_date desc limit 10;
>
> QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.43..1660.98 rows=10 width=1839) (actual
> time=4990.424..4990.424 rows=0 loops=1)
> -> Index Scan Backward using test_orders_o_date_idx on test_orders
> (cost=0.43..1442355.43 rows=8686 width=1839) (actual
> time=4990.423..4990.423 rows=0 loops=1)
> Filter: (customer_id = ANY
> ('{9993,9997,9912,9954,9100,9101,9102,9234,9500,9512}'::integer[]))
> Rows Removed by Filter: 5000000
> Planning Time: 0.063 ms
> Execution Time: 4990.435 ms
>
>
> Is there something we can adjust to get it to prefer
> test_orders_customer_id_o_date_idx even when there's a limit clause ?
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Lewis 2020-05-19 21:15:57 Re: A limit clause can cause a poor index choice
Previous Message Nick Cleaton 2020-05-19 19:00:04 A limit clause can cause a poor index choice