Quick Links

Re: BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit

From:	Mao Jiayin <maojiayin(at)gmail(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit
Date:	2024-04-07 03:32:41
Message-ID:	CABYdTUfXkTQoF=Aw=zxVGPZAs+83hvZA70NPLSBEERbY28wsKg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Thanks Jeff. This is really helpful. I agree this is not really a bug,
please feel free to close it.

If you still want to know the size of the table, it is about 6 million.

On Sat, Apr 6, 2024 at 7:25 PM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Sat, Apr 6, 2024 at 5:44 PM PG Bug reporting form <
> noreply(at)postgresql(dot)org> wrote:
>
>>
>> For example, our "user" table has an id primary key, an "org_id" column
>> and
>> a "disabled" column. The table has millions of rows and for each org_id
>> there is only usually a few hundred rows.
>
>
> It would be helpful to know precisely how many millions of rows. We know
> it actually removed 596003 rows from the ordered index scan, but we don't
> know how many it thought it would need to remove. I reckon it thought it
> would remove # in table / 837, but I don't know what that division comes
> out to, not knowing the numerator.
>
>
>> -> Index Scan using user_org_disabled_idx on user
>> (cost=0.43..3141.43 rows=837 width=236) (actual time=0.049..1.407 rows=166
>> loops=1)
>>
>
> So this estimate is quite wrong, 837/166 = 5. Do you know why? This bad
> estimate makes this plan look 5 times too expensive, and the competing one
> look 5 times too cheap, for a ratio of 25. That is more than the current
> ratio between the two plan cost estimates, so fixing this could drive the
> difference. (The ratio of actual times is more than 25, so there is more
> to the problem than just this, but fixing this alone should be enough to
> drive the correct choice). So why is this estimate that bad? Is the
> selectivity estimate of `org_id = 123456` alone that bad, or is it only
> when combined with `disabled=false`?
>
> A more robust solution is to add an index on (org_id, disabled, id). That
> way it can combine the two strategies, jumping to just the part of the
> index it needs and then reading it already in order. Not only will this be
> much faster than either of the two plans you show, it will also be more
> resilient to estimation errors.
>
> Anyway, these just look like well-known estimation difficulties, nothing
> which seems like an actual bug. Estimation is hard and sometimes there is
> no way to know the correct value to use until after the query is already
> underway.
>
> Cheers,
>
> Jeff
>

In response to

Re: BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit at 2024-04-06 23:25:08 from Jeff Janes

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Tender Wang	2024-04-07 09:41:01	Re: BUG #18422: Assert in expandTupleDesc() fails on row mismatch with additional SRF
Previous Message	Noah Misch	2024-04-07 01:01:46	Re: BUG #18404: Select from pg_stat_activity in a plpgsql block can lead to a global locking issue