Re: Query takes way longer with LIMIT, and EXPLAIN takes way longer than actual query

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Ken Tanzer <ken(dot)tanzer(at)gmail(dot)com>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query takes way longer with LIMIT, and EXPLAIN takes way longer than actual query
Date: 2020-08-15 00:24:27
Message-ID: CAMkU=1zwiuVGZ__keNbaRcnh+OJq8UP5pN-KuCsEVJgoqe8=Ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Aug 14, 2020 at 5:35 PM Ken Tanzer <ken(dot)tanzer(at)gmail(dot)com> wrote:

> Hi. I've got a query that runs fine (~50ms). When I add a "LIMIT 25" to
> it though, it takes way longer. The query itself then takes about 4.5
> seconds. And when I do an explain, it takes 90+ seconds for the same query!
>
> Explains and detailed table/view info below. tbl_log has 1.2M records,
> tbl_reference has 550k. This is 9.6.19 on CentOS 6 with PDGG packages.
>

CentOS6 has slow clock calls, so it is not surprising that EXPLAIN ANALYZE
with TIMING defaulting to ON is slow. Using something more modern for the
distribution should really help that, but for the current case just setting
TIMING OFF should be good enough as it is the row counts which are
interesting, not the timing of individual steps.

> I know the query itself could be re-written, but it's coming from an ORM,
> so I'm really focused on why the adding a limit is causing such performance
> degradation, and what to do about it.
>

But if it is coming from an ORM and you can't rewrite it, then what can you
do about it? Can you set enable_someting or something_cost parameters
locally just for the duration of one query? If the ORM doesn't let you
re-write, then I doubt it would let you do that, either. Since you are
using such an old version, you can't create multivariate statistics, either
(although I doubt they would help anyway).

-> Nested Loop (cost=4313.36..14216.18 rows=611586 width=336) (actual
> time=10.837..38.177 rows=1432 loops=1)
> -> HashAggregate (cost=4312.93..4325.68 rows=1275 width=136)
> (actual time=10.802..13.800 rows=1433 loops=1)
> -> Index Scan using tbl_log_pkey on public.tbl_log
> (cost=0.43..7.75 rows=1 width=336) (actual time=0.007..0.009 rows=1
> loops=1433)
>

The way-off row estimate for the nested loop is the cause of the bad plan
choice once you add the LIMIT. But what is the cause of the bad estimate?
If you just multiply the estimates for each of the child nodes, you get
about the correct answer. But the estimate for the nested loop is very
different from the product of the children. On the one hand that isn't
surprising, as the row estimates are computed at each node from first
principles, not computed from the bottom up. But usually if the stats are
way off, you can follow the error down to a lower level where they are also
way off, but in this case you can't. That makes it really hard to reason
about what the problem might be.

Can you clone your server, upgrade the clone to 12.4 or 13BETA3 or 14dev,
and see if the problem still exists there? Can you anonymize your data so
that you can publish an example other people could run themselves to
dissect the problem; or maybe give some queries that generate random data
which have the correct data distribution to reproduce the issue?

Cheers,

Jeff

>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2020-08-15 00:55:33 Re: Query takes way longer with LIMIT, and EXPLAIN takes way longer than actual query
Previous Message Ken Tanzer 2020-08-14 22:40:40 Re: Query takes way longer with LIMIT, and EXPLAIN takes way longer than actual query