Re: Help needed to understand query planner regression with incremental sort

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Henrik Peinar (nodeSWAT(dot)com)" <henrik(at)nodeswat(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Help needed to understand query planner regression with incremental sort
Date: 2023-05-31 15:31:30
Message-ID: 2818.1685547090@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"Henrik Peinar (nodeSWAT.com)" <henrik(at)nodeswat(dot)com> writes:
> *Quick description: *After upgrading our Aurora PostgreSQL engine from v11
> to v15, one of our often run SQL's started taking tens of seconds instead
> of running sub 100ms. Explain analyze showed that the query planner had
> switched to using incremental sort. Running ANALYZE on the table fixed the
> issue temporarily, but it came back after few days.

Hmm. I think it's quite accidental that you get one plan over the other,
because it looks like the key difference is something the planner doesn't
account for. In the fast case you have

-> Index Scan using products_pinned_at_listed_at_ix on products (cost=0.43..3929423.12 rows=26523 width=1463) (actual time=2.325..32.872 rows=61 loops=1)
Filter: ((desc_tsv @@ '''nike'':*'::tsquery) AND (status = 'published'::prod_status))
Rows Removed by Filter: 3376

The index is only being used to produce ordered output here: the filter
condition isn't related to the index. And what we see is that the
query is fast because the desired rows are found in the first 3376+61
rows visited in this direction. Meanwhile in the slow case you have

-> Index Scan Backward using products_pinned_at_ix on products (cost=0.43..1172249.47 rows=26139 width=1460) (actual time=5.263..5203.180 rows=18411 loops=1)
Filter: ((desc_tsv @@ '''nike'':*'::tsquery) AND (status = 'published'::prod_status))
Rows Removed by Filter: 5415895

Again, the filter condition isn't exploiting the index, we're just using
the index to (partially) satisfy the ORDER BY. This one takes a long time
because it has to trawl through 5415895+61 rows before reaching the LIMIT.

So AFAICS, the runtime differential is mostly/entirely because the rows
satisfying the filter condition are located near one end of the range of
pinned_at. That is not a correlation that the planner knows anything
about, so it's unable to see that these two ways of scanning the table
will have significantly different costs. Moreover, I think you'd be
mistaken to draw any great conclusions from this specific example,
because with some other search term(s) the results might be totally
different due to the required rows not falling in the same place.

What I'd think about if this type of query is important is to set up
an index that can actually be used to satisfy the filter condition,
so that you're not forcing it into "scan the whole table till you
find the rows you want". It looks like you already have such an
index, ie a GIN index on the desc_tsv column, although I don't trust
that your schema attachment is actually accurate because if it is
then you have a bunch of duplicative indexes. You might try
dropping the other indexes to see if you can coerce the planner
into using that one, and then seeing what the cost estimate is.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message peter.borissow@kartographia.com 2023-05-31 17:53:01 Re: Hash Index on Partitioned Table
Previous Message Adrian Klaver 2023-05-31 15:14:44 Re: speed up full table scan using psql