From: | Michael Guissine <mguissine(at)gmail(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4 |
Date: | 2022-10-20 19:29:21 |
Message-ID: | CACxDrAkYZA44Sn2wbcE+waD45EKc5gD2D-FBDAsuGCikKemVAw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi Jeff, this is super useful, thanks for spending your time explaining the
internals. In our case, we ended up dropping the trgm index and everything
is back to normal but I like your idea of reversing the column order, will
give it a try!
~ Michael
On Thu, Oct 20, 2022 at 1:03 AM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Wed, Oct 19, 2022 at 3:44 PM Michael Guissine <mguissine(at)gmail(dot)com>
> wrote:
>
>> paritioned_table_hash_p3_partition_key_text_idx is in fact pg_trgm index,
>> you are right!
>> interesting, why would it use that index at all on some partitions but
>> not on others? the query is a simple match on 3 columns and we have other
>> indexes to cover it
>>
>
> The two index choices are (falsely) similar in cost, so just small
> differences in the selectivity estimates is enough to drive one over the
> other from partition to partition.
>
> There are (at least) two planner deficiencies at work here.
>
> For the actually faster index, it thinks it will need to re-descend the
> index one time for every value in the IN-list on the leading column, and
> each of those re-descents is going to end up on a different leaf page,
> charging about one random_io_cost for each one. But it is counting a
> re-descent for every item in the IN-list, for every partition. But it only
> needs to re-descend for the items of the IN-list which also happen to match
> the remainder for that hash partition. This is incorrectly inflating the
> cost of the faster index. (You could argue that if the executor really
> does do the re-descents, then it is correct for the planner to account for
> them. But I suspect that many of the hopeless re-descents may end up on
> leaf pages already in memory. Anyway, fixing the planner and fixing the
> executor would probably go hand in hand)
>
> For the slower index, the gin code doesn't estimate how many rechecks
> there will be, only how many end results there will be. But the rechecks
> are where most of the true cost is. So that makes this one end up with an
> incorrectly low cost. The overestimation of one and the underestimation of
> the other bring the two estimates close enough together that random
> variations in the stats can drive the choice between them. Now you could
> say "all the row estimates are 1, so how could there be any variation"?
> But notice that for the bitmap heap scans, there is a filter step which
> removes around 1000 rows from each partition. We know how many rows were
> actually removed from each partition, but we are not told how many rows the
> planner thought it would remove. Variations in this unseen estimated
> filtered number would lead to variations in the cost estimate.
>
> While fixing the planner would be nice, I think the best current solution
> for you is to change the order of columns in the 3-column index to be
> (text, start_time, key). Since the condition on "text" is extremely
> selective, the planner should know that by the time it gets to the key
> column it can check it with a single hash-table probe rather than a bunch
> of costly re-descents. This should be estimated to be faster than either
> of the existing two indexes, and also should actually be faster.
>
> Cheers,
>
> Jeff
>
>>
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Guo | 2022-10-21 01:05:43 | Re: Documentation error |
Previous Message | PG Bug reporting form | 2022-10-20 15:06:11 | BUG #17658: Download release 15 without success |