From: | Kouber Saparev <kouber(at)saparev(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | LIMIT confuses the planner (again) |
Date: | 2009-09-28 08:43:00 |
Message-ID: | 4AC07714.1080909@saparev.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello,
I am using PostgreSQL 8.3.7 and I am experiencing an issue similar to
the one I've already described some time ago:
http://archives.postgresql.org/pgsql-performance/2009-02/msg00261.php
Again, adding a LIMIT clause to a query, which is normally executing
very fast thanks to an index, makes it perform slow, because the planner
no longer uses the "correct" index.
I have the following table:
CREATE TABLE message (
message_sid SERIAL PRIMARY KEY,
from_profile_sid INT NOT NULL REFERENCES profile,
to_profile_sid INT NOT NULL REFERENCES profile,
sender_has_deleted BOOLEAN NOT NULL DEFAULT FALSE,
receiver_has_deleted BOOLEAN NOT NULL DEFAULT FALSE,
body TEXT,
datetime TIMESTAMP NOT NULL DEFAULT NOW()
);
With the following conditional index:
CREATE INDEX message_to_profile_idx ON message (to_profile_sid) WHERE
NOT receiver_has_deleted;
The query to obtain the list of received messages of a profile is simple
and executes very fast, because of the index above:
db=# EXPLAIN ANALYZE SELECT
*
FROM
message
WHERE
to_profile_sid = -1
AND
NOT receiver_has_deleted
ORDER BY
message_sid DESC;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=11857.09..11866.19 rows=3640 width=277) (actual
time=0.317..0.319 rows=15 loops=1)
Sort Key: message_sid
Sort Method: quicksort Memory: 32kB
-> Bitmap Heap Scan on message (cost=106.44..11641.78 rows=3640
width=277) (actual time=0.096..0.271 rows=15 loops=1)
Recheck Cond: ((to_profile_sid = (-1)) AND (NOT
receiver_has_deleted))
-> Bitmap Index Scan on message_to_profile_idx
(cost=0.00..105.53 rows=3640 width=0) (actual time=0.056..0.056 rows=21
loops=1)
Index Cond: (to_profile_sid = (-1))
Total runtime: 0.383 ms
(8 rows)
Adding a LIMIT clause to exactly the same query slows its execution more
than 20'000 times:
db=# EXPLAIN ANALYZE SELECT
*
FROM
message
WHERE
to_profile_sid = -1
AND
NOT receiver_has_deleted
ORDER BY
message_sid DESC LIMIT 20;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..6513.60 rows=20 width=277) (actual
time=0.617..6576.539 rows=15 loops=1)
-> Index Scan Backward using message_pkey on message
(cost=0.00..1185474.32 rows=3640 width=277) (actual time=0.617..6576.522
rows=15 loops=1)
Filter: ((NOT receiver_has_deleted) AND (to_profile_sid = (-1)))
Total runtime: 6576.572 ms
(4 rows)
Just as I was advised in my recent post, I've already increased the
statistics of both fields all the way till 1000, analyzed the table and
reindexed the index:
ALTER TABLE message ALTER COLUMN to_profile_sid SET STATISTICS 1000;
ALTER TABLE message ALTER COLUMN receiver_has_deleted SET STATISTICS 1000;
ANALYZE message;
REINDEX index message_to_profile_idx;
This time, however, the steps above didn't affect the planner in any
way, it still refuses to use the index "message_to_profile_idx" when a
LIMIT is involved (for this particular value of to_profile_sid).
Here's some statistical data:
db=# SELECT COUNT(*) FROM message;
count
---------
1312213
(1 row)
db=# SELECT COUNT(*) FROM message WHERE to_profile_sid = -1;
count
-------
5604
(1 row)
db=# SELECT COUNT(*) FROM message WHERE to_profile_sid = -1 AND NOT
receiver_has_deleted;
count
-------
15
(1 row)
db=# SELECT COUNT(DISTINCT to_profile_sid) FROM message;
count
-------
8596
(1 row)
db=# SELECT AVG(length) FROM (SELECT to_profile_sid, COUNT(*) AS length
FROM message GROUP BY to_profile_sid) AS freq;
avg
----------------------
152.6540251279664960
(1 row)
db=# SELECT n_distinct FROM pg_stats WHERE tablename='message' AND
attname='to_profile_sid';
n_distinct
------------
6277
(1 row)
Also, the value of -1 for "to_profile_sid" is second in the list of
most_common_vals in pg_stats, but still I don't understand why a simple
limit is blinding the planner for the "good" index. Any ideas?
Regards,
--
Kouber Saparev
http://kouber.saparev.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2009-09-28 11:38:54 | Re: LIMIT confuses the planner (again) |
Previous Message | std pik | 2009-09-28 06:13:40 | Postgres performance |