Re: Query is slow when order by and limit clause are used in the query

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, sreekanth vajrapu <sreekanthvajrapu(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Query is slow when order by and limit clause are used in the query
Date: 2021-05-24 14:19:24
Message-ID: a8615d56-7e9d-8362-e61e-114fa92f1c02@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 5/24/21 2:28 PM, Bharath Rupireddy wrote:
> On Mon, May 24, 2021 at 5:01 PM sreekanth vajrapu
> <sreekanthvajrapu(at)gmail(dot)com> wrote:
>>
>> Hi Bharath,
>>
>> Thanks for the quick reply, I have attached the execution plan for below 3 scenarios. Our application is using 1st Scenario
>>
>> 1) WITH ORDER BY AND LIMIT 30 -- Very slow(3 to 5 seconds)
>> 2) WITH ORDER BY WITHOUT LIMIT -- Very fast(160 MS)
>> 3) WITH ORDER BY WITHOUT LIMIT 100 -- Very fast(160 MS)
>>
>> Kidney let me know if you need any more details on this.
>
> I see that there are a huge number of Heap Fetches: 599354 with LIMIT
> 30 clause vs Heap Fetches: 11897 without LIMIT clause, maybe that
> could be the reason for the slowness. I'm not sure why this is
> happening with the LIMIT 30 clause only. Is it that this issue happens
> every time? Say, if you run with LIMIT 30, then the query finishes in
> 3-5sec. Immediately if you run without a LIMIT clause then the query
> completes in 160ms. Is vacuum running successfully on the tables and
> indexes for which there's a huge number of heap fetches?
>

But the heap fetches are in the index-only part of the plan, and it
matches the number of loops for that node. There are 521996 loops and
599354 heap fetches, so roughly 1:1, i.e. roughly the same ratio as for
the faster plans.

It's hard to give advice when we only have the plans, not the original
queries and index definitions, and when the plans are "anonymized" in a
rather inconsistent way (hard to say how the tables/indexes match
between the queries).

If I had to guess, I'd say this is a case of the usual LIMIT problem,
where the optimizer assumes the matching rows are uniformly distributed
in the input relation, when in reality it's "concentrated" at the end.

For example, imagine you have a table with two random columns:

CREATE TABLE t (a int, b int);

INSERT INTO t SELECT 100 * random(), 100 * random()
FROM generate_series(1,1000000) g;

CREATE INDEX ON t (a);

Now imagine you do this:

SELECT * FROM t WHERE a = 10 AND b = 20 LIMIT 1;

In this case there's ~10000 rows with a=10, and we can fetch that from
the index. And we know there's ~100 groups with different b values,
distributed uniformly in the 10k rows. So if we start scanning the
index, after ~100 rows we should get a value with b=20, in which case
LIMIT 1 is done.

Now imagine the values are not distributed uniformly like this, but
instead we have this:

INSERT INTO t SELECT i/10000, i/10000
FROM generate_series(1,1000000) s(i);

In this case the DB will still believe it'll only scan ~100 rows, but
there are no rows with a=10 and b=20, so it'll end up scanning all 10k
rows with a=10. Or maybe all the "b=20" rows are at the very end of the
index, or something like that. Perhaps the "delete" column makes it
behave like this.

This is only an issue for queries with LIMIT, because that pushes the
planner to pick a plan with very low startup cost. But those plans often
have very high total cost, and degrade poorly when the uniformity
assumption is incorrect.

Hard to say, though, confirming it would require looking at the data
more closely. The one thing I'd suggest is changing the xxxx_index to
also include the "deleted" column, but it's a stab in the dark.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2021-05-25 00:05:27 Re: Query is slow when order by and limit clause are used in the query
Previous Message Bharath Rupireddy 2021-05-24 12:28:40 Re: Query is slow when order by and limit clause are used in the query