From: | Mao Jiayin <maojiayin(at)gmail(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit |
Date: | 2024-04-07 03:32:41 |
Message-ID: | CABYdTUfXkTQoF=Aw=zxVGPZAs+83hvZA70NPLSBEERbY28wsKg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Thanks Jeff. This is really helpful. I agree this is not really a bug,
please feel free to close it.
If you still want to know the size of the table, it is about 6 million.
On Sat, Apr 6, 2024 at 7:25 PM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Sat, Apr 6, 2024 at 5:44 PM PG Bug reporting form <
> noreply(at)postgresql(dot)org> wrote:
>
>>
>> For example, our "user" table has an id primary key, an "org_id" column
>> and
>> a "disabled" column. The table has millions of rows and for each org_id
>> there is only usually a few hundred rows.
>
>
> It would be helpful to know precisely how many millions of rows. We know
> it actually removed 596003 rows from the ordered index scan, but we don't
> know how many it thought it would need to remove. I reckon it thought it
> would remove # in table / 837, but I don't know what that division comes
> out to, not knowing the numerator.
>
>
>> -> Index Scan using user_org_disabled_idx on user
>> (cost=0.43..3141.43 rows=837 width=236) (actual time=0.049..1.407 rows=166
>> loops=1)
>>
>
> So this estimate is quite wrong, 837/166 = 5. Do you know why? This bad
> estimate makes this plan look 5 times too expensive, and the competing one
> look 5 times too cheap, for a ratio of 25. That is more than the current
> ratio between the two plan cost estimates, so fixing this could drive the
> difference. (The ratio of actual times is more than 25, so there is more
> to the problem than just this, but fixing this alone should be enough to
> drive the correct choice). So why is this estimate that bad? Is the
> selectivity estimate of `org_id = 123456` alone that bad, or is it only
> when combined with `disabled=false`?
>
> A more robust solution is to add an index on (org_id, disabled, id). That
> way it can combine the two strategies, jumping to just the part of the
> index it needs and then reading it already in order. Not only will this be
> much faster than either of the two plans you show, it will also be more
> resilient to estimation errors.
>
> Anyway, these just look like well-known estimation difficulties, nothing
> which seems like an actual bug. Estimation is hard and sometimes there is
> no way to know the correct value to use until after the query is already
> underway.
>
> Cheers,
>
> Jeff
>
From | Date | Subject | |
---|---|---|---|
Next Message | Tender Wang | 2024-04-07 09:41:01 | Re: BUG #18422: Assert in expandTupleDesc() fails on row mismatch with additional SRF |
Previous Message | Noah Misch | 2024-04-07 01:01:46 | Re: BUG #18404: Select from pg_stat_activity in a plpgsql block can lead to a global locking issue |