Re: Query is slow when order by and limit clause are used in the query

From: sreekanth vajrapu <sreekanthvajrapu(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Query is slow when order by and limit clause are used in the query
Date: 2021-05-28 09:40:12
Message-ID: CAKPbTYgnMgj8tjTuPU1aRazH3NCXePWgvb1AXGPWDJh1k4h0Wg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Team,

Thank you so much for the quick response on my issue.

I minimized the query to the problematic part, Also I tried with all the
options that you suggested, Still I am seeing the slow when I USE LIMIT 30.

Answers to your questions.

1) Statistics are up to date.
2) Version of the postgres is "PostgreSQL 9.5.21".

Below are the execution plans along with the query and index definitions.
Kindly help me to resolve this issue.

*SCENARIO 1: QUERY PLAN WITH ORDER BY ASC AND WITH LIMIT 31 (This is actual
one which is being used by application)*
explain analyze
SELECT item.*
FROM (
SELECT
t1.name,
t1.deleted
FROM t1
JOIN t2 USING (id)
JOIN t3 USING (id)
) item
WHERE (
deleted = FALSE
OR deleted IS NULL
)
ORDER BY item.name LIMIT 31 OFFSET 0;

QUERY PLAN WITH ORDER BY ASC AND WITH LIMIT 31
--------------------------------------------------------------------------------------------------------------------
Limit (cost=1.14..1034.75 rows=31 width=26) (actual
time=1206.826..1207.108 rows=31 loops=1)
-> Nested Loop (cost=1.14..339827.40 rows=10192 width=26) (actual
time=1206.824..1207.102 rows=31 loops=1)
-> Nested Loop (cost=0.71..334957.35 rows=10415 width=58)
(actual time=1206.811..1206.951 rows=31 loops=1)
-> Index Scan using t1_name_index on t1
(cost=0.42..165174.97 rows=542766 width=42) (actual time=0.012..636.150
rows=521998 loops=1)
Filter: ((NOT deleted) OR (deleted IS NULL))
-> Index Only Scan using t2_pkey on t2 (cost=0.29..0.30
rows=1 width=16) (actual time=0.001..0.001 rows=0 loops=521998)
Index Cond: (id = t1.id)
Heap Fetches: 0
-> Index Only Scan using t3_pkey on t3 (cost=0.42..0.46 rows=1
width=16) (actual time=0.004..0.004 rows=1 loops=31)
Index Cond: (id = t1.id)
Heap Fetches: 0
Planning time: 0.624 ms
Execution time: 1207.162 ms
(13 rows)

*SCENARIO 2: QUERY PLAN WITH ORDER BY ASC AND WITHOUT LIMIT*
explain analyze
SELECT item.*
FROM (
SELECT
t1.name,
t1.deleted
FROM t1
JOIN t2 USING (id)
JOIN t3 USING (id)
) item
WHERE (
deleted = FALSE
OR deleted IS NULL
)
ORDER BY item.name;

QUERY PLAN WITH ORDER BY ASC AND WITHOUT LIMIT
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=18989.88..19015.36 rows=10192 width=26) (actual
time=166.891..167.575 rows=10415 loops=1)
Sort Key: t1.name
Sort Method: quicksort Memory: 1198kB
-> Nested Loop (cost=291.76..18311.34 rows=10192 width=26) (actual
time=1.912..139.370 rows=10415 loops=1)
Join Filter: (t2.id = t1.id)
-> Hash Join (cost=291.34..10571.03 rows=10415 width=32) (actual
time=1.897..93.228 rows=10415 loops=1)
Hash Cond: (t3.id = t2.id)
-> Seq Scan on t3 (cost=0.00..8183.67 rows=531167
width=16) (actual time=0.005..30.359 rows=531167 loops=1)
-> Hash (cost=161.15..161.15 rows=10415 width=16) (actual
time=1.861..1.861 rows=10415 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 617kB
-> Seq Scan on t2 (cost=0.00..161.15 rows=10415
width=16) (actual time=0.002..0.753 rows=10415 loops=1)
-> Index Scan using t1_pkey on t1 (cost=0.42..0.73 rows=1
width=42) (actual time=0.004..0.004 rows=1 loops=10415)
Index Cond: (id = t3.id)
Filter: ((NOT deleted) OR (deleted IS NULL))
Planning time: 0.562 ms
Execution time: 167.973 ms
(16 rows)

*SCENARIO 3: QUERY PLAN WITH ORDER BY DESC AND WITH LIMIT 31*
explain analyze
SELECT item.*
FROM (
SELECT
t1.name,
t1.deleted
FROM t1
JOIN t2 USING (id)
JOIN t3 USING (id)
) item
WHERE (
deleted = FALSE
OR deleted IS NULL
)
ORDER BY item.name DESC LIMIT 31 OFFSET 0;

QUERY PLAN WITH ORDER BY DESC AND WITH LIMIT 31
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.14..1034.75 rows=31 width=26) (actual time=0.039..0.431
rows=31 loops=1)
-> Nested Loop (cost=1.14..339827.40 rows=10192 width=26) (actual
time=0.038..0.428 rows=31 loops=1)
-> Nested Loop (cost=0.71..334957.35 rows=10415 width=58)
(actual time=0.021..0.261 rows=31 loops=1)
-> Index Scan Backward using t1_name_index on t1
(cost=0.42..165174.97 rows=542766 width=42) (actual time=0.012..0.091
rows=61 loops=1)
Filter: ((NOT deleted) OR (deleted IS NULL))
-> Index Only Scan using t2_pkey on t2 (cost=0.29..0.30
rows=1 width=16) (actual time=0.003..0.003 rows=1 loops=61)
Index Cond: (id = t1.id)
Heap Fetches: 0
-> Index Only Scan using t3_pkey on t3 (cost=0.42..0.46 rows=1
width=16) (actual time=0.005..0.005 rows=1 loops=31)
Index Cond: (id = t1.id)
Heap Fetches: 0
Planning time: 0.663 ms
Execution time: 0.491 ms
(13 rows)

*SCENARIO 4: QUERY PLAN WITH ORDER BY DESC AND WITHOUT LIMIT*
explain analyze
SELECT item.*
FROM (
SELECT
t1.name,
t1.deleted
FROM t1
JOIN t2 USING (id)
JOIN t3 USING (id)
) item
WHERE (
deleted = FALSE
OR deleted IS NULL
)
ORDER BY item.name DESC

QUERY PLAN WITH ORDER BY DESC AND WITHOUT LIMIT
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=18989.88..19015.36 rows=10192 width=26) (actual
time=177.915..178.513 rows=10415 loops=1)
Sort Key: t1.name DESC
Sort Method: quicksort Memory: 1198kB
-> Nested Loop (cost=291.76..18311.34 rows=10192 width=26) (actual
time=2.052..149.175 rows=10415 loops=1)
Join Filter: (t2.id = t1.id)
-> Hash Join (cost=291.34..10571.03 rows=10415 width=32) (actual
time=2.038..96.798 rows=10415 loops=1)
Hash Cond: (t3.id = t2.id)
-> Seq Scan on t3 (cost=0.00..8183.67 rows=531167
width=16) (actual time=0.005..31.440 rows=531167 loops=1)
-> Hash (cost=161.15..161.15 rows=10415 width=16) (actual
time=1.993..1.993 rows=10415 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 617kB
-> Seq Scan on t2 (cost=0.00..161.15 rows=10415
width=16) (actual time=0.004..0.822 rows=10415 loops=1)
-> Index Scan using t1_pkey on t1 (cost=0.42..0.73 rows=1
width=42) (actual time=0.005..0.005 rows=1 loops=10415)
Index Cond: (id = t3.id)
Filter: ((NOT deleted) OR (deleted IS NULL))
Planning time: 0.585 ms
Execution time: 178.915 ms
(16 rows)

*SCENARIO 5: QUERY PLAN WITH ORDER BY ASC AND LIMIT 30 BY CHAING QUERY TO
(NOT COALESCE(deleted, false))*

explain analyze
SELECT item.*
FROM (
SELECT
t1.name,
t1.deleted
FROM t1
JOIN t2 USING (id)
JOIN t3 USING (id)
) item
WHERE ( NOT COALESCE(deleted,false)
--deleted = FALSE
--OR deleted IS NULL
)
ORDER BY item.name LIMIT 31 OFFSET 0;

QUERY PLAN WITH ORDER BY ASC AND LIMIT 30 BY CHAING QUERY TO (NOT
COALESCE(deleted, false))
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.14..1034.75 rows=31 width=26) (actual
time=1230.705..1230.983 rows=31 loops=1)
-> Nested Loop (cost=1.14..339827.40 rows=10192 width=26) (actual
time=1230.704..1230.980 rows=31 loops=1)
-> Nested Loop (cost=0.71..334957.35 rows=10415 width=58)
(actual time=1230.693..1230.835 rows=31 loops=1)
-> Index Scan using t1_name_index on t1
(cost=0.42..165174.97 rows=542766 width=42) (actual time=0.017..666.816
rows=521998 loops=1)
Filter: (NOT COALESCE(deleted, false))
-> Index Only Scan using t2_pkey on t2 (cost=0.29..0.30
rows=1 width=16) (actual time=0.001..0.001 rows=0 loops=521998)
Index Cond: (id = t1.id)
Heap Fetches: 0
-> Index Only Scan using t3_pkey on t3 (cost=0.42..0.46 rows=1
width=16) (actual time=0.004..0.004 rows=1 loops=31)
Index Cond: (id = t1.id)
Heap Fetches: 0
Planning time: 0.848 ms
Execution time: 1231.039 ms
(13 rows)

Time: 1255.381 ms (00:01.255)

*Index Definitions;*
ALTER TABLE t1 ADD CONSTRAINT t1_pkey PRIMARY KEY (id);

CREATE INDEX t1_name_index ON t1 USING btree (name COLLATE
pg_catalog."default" ASC NULLS LAST);
CREATE INDEX t1_name_deleted_peteam1 ON t1 USING btree ((COALESCE(deleted,
false)), name);
CREATE INDEX t1_name_deleted_peteam2 ON t1 USING btree (name,
(COALESCE(deleted, false)));
CREATE INDEX t1_name_deleted_peteam3 ON t1 USING btree ((COALESCE(deleted,
false)), name DESC);
CREATE INDEX t1_name_deleted_peteam4 ON t1 USING btree (name DESC,
(COALESCE(deleted, false)));
CREATE INDEX t1_name_index_peteam5 ON t1 USING btree (name, deleted);

ALTER TABLE t2 ADD CONSTRAINT t2_pkey PRIMARY KEY (id);

ALTER TABLE t3 ADD CONSTRAINT t3_pkey PRIMARY KEY (id);

On Tue, May 25, 2021 at 5:35 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Tue, 25 May 2021 at 02:19, Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> > If I had to guess, I'd say this is a case of the usual LIMIT problem,
> > where the optimizer assumes the matching rows are uniformly distributed
> > in the input relation, when in reality it's "concentrated" at the end.
>
> I'd guess that too. But hard to say due to the inconsistent
> anonymisation of the plan,
>
> > Hard to say, though, confirming it would require looking at the data
> > more closely. The one thing I'd suggest is changing the xxxx_index to
> > also include the "deleted" column, but it's a stab in the dark.
>
> I'd say, providing xxxx_item and xxxxx_item are actually the same
> table but just anonymised poorly, then an index such as:
>
> create index on xxxx_item(COALESCE(deleted,false), name);
>
> then change the query so instead of doing WHERE NOT deleted or deleted
> is null; do instead WHERE NOT COALESCE(deleted,false);
>
> Without the query change then there's no hope of that index being used.
>
> I think this would improve the situation as the LIMIT 30 plan is using
> xxxxx_index to provide presorted results for the ORDER BY but can only
> do index filtering on: (((NOT deleted) OR (deleted IS NULL)) AND
> (SubPlan 6)). So providing not too many rows are filtered out by
> SubPlan 6, then that should reduce the Rows Removed by Filter.
> However, if the majority of those rows are filtered out by Subplan 6,
> then the index won't help much.
>
> It would be nice if the schema was better designed so the deleted
> column could only be true or false though.
>
> sreekanth, for the future, you can use https://explain.depesz.com/ to
> anonymise your queries. It'll do it in a consistent way that changes
> the names of things in a consistent way that people can still follow.
>
> David
>

--
Thanks & Regards,
Sreekanth

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message okano.naoki@fujitsu.com 2021-05-28 10:51:52 RE: CR is not removed with psql -f command on Windows.
Previous Message PG Bug reporting form 2021-05-27 22:22:43 BUG #17039: Won't Upgrade - Repo Error