Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

From: Michael Guissine <mguissine(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4
Date: 2022-10-20 19:29:21
Message-ID: CACxDrAkYZA44Sn2wbcE+waD45EKc5gD2D-FBDAsuGCikKemVAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Jeff, this is super useful, thanks for spending your time explaining the
internals. In our case, we ended up dropping the trgm index and everything
is back to normal but I like your idea of reversing the column order, will
give it a try!

~ Michael

On Thu, Oct 20, 2022 at 1:03 AM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Wed, Oct 19, 2022 at 3:44 PM Michael Guissine <mguissine(at)gmail(dot)com>
> wrote:
>
>> paritioned_table_hash_p3_partition_key_text_idx is in fact pg_trgm index,
>> you are right!
>> interesting, why would it use that index at all on some partitions but
>> not on others? the query is a simple match on 3 columns and we have other
>> indexes to cover it
>>
>
> The two index choices are (falsely) similar in cost, so just small
> differences in the selectivity estimates is enough to drive one over the
> other from partition to partition.
>
> There are (at least) two planner deficiencies at work here.
>
> For the actually faster index, it thinks it will need to re-descend the
> index one time for every value in the IN-list on the leading column, and
> each of those re-descents is going to end up on a different leaf page,
> charging about one random_io_cost for each one. But it is counting a
> re-descent for every item in the IN-list, for every partition. But it only
> needs to re-descend for the items of the IN-list which also happen to match
> the remainder for that hash partition. This is incorrectly inflating the
> cost of the faster index. (You could argue that if the executor really
> does do the re-descents, then it is correct for the planner to account for
> them. But I suspect that many of the hopeless re-descents may end up on
> leaf pages already in memory. Anyway, fixing the planner and fixing the
> executor would probably go hand in hand)
>
> For the slower index, the gin code doesn't estimate how many rechecks
> there will be, only how many end results there will be. But the rechecks
> are where most of the true cost is. So that makes this one end up with an
> incorrectly low cost. The overestimation of one and the underestimation of
> the other bring the two estimates close enough together that random
> variations in the stats can drive the choice between them. Now you could
> say "all the row estimates are 1, so how could there be any variation"?
> But notice that for the bitmap heap scans, there is a filter step which
> removes around 1000 rows from each partition. We know how many rows were
> actually removed from each partition, but we are not told how many rows the
> planner thought it would remove. Variations in this unseen estimated
> filtered number would lead to variations in the cost estimate.
>
> While fixing the planner would be nice, I think the best current solution
> for you is to change the order of columns in the 3-column index to be
> (text, start_time, key). Since the condition on "text" is extremely
> selective, the planner should know that by the time it gets to the key
> column it can check it with a single hash-table probe rather than a bunch
> of costly re-descents. This should be estimated to be faster than either
> of the existing two indexes, and also should actually be faster.
>
> Cheers,
>
> Jeff
>
>>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Richard Guo 2022-10-21 01:05:43 Re: Documentation error
Previous Message PG Bug reporting form 2022-10-20 15:06:11 BUG #17658: Download release 15 without success