performance problem with LIMIT (order BY in DESC order). Wrong index used?

From: Dieter Rehbein <dieter(dot)rehbein(at)skiline(dot)cc>
To: pgsql-performance(at)postgresql(dot)org
Subject: performance problem with LIMIT (order BY in DESC order). Wrong index used?
Date: 2011-04-12 05:20:44
Message-ID: 152FFD9B-A288-4329-9C96-0FF0C6A2D418@skiline.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi everybody,

I have a performance-problem with a query using a LIMIT. There are other threads rergading performance issues with LIMIT, but I didn't find useful hints for our problem and it might
be interesting for other postgres-users.

There are only 2 simple tables:

CREATE TABLE newsfeed
(
id varchar(32) PRIMARY KEY,
version int4 NOT NULL,
newsfeed_type varchar(20) NOT NULL,
new_item_count int4 NOT NULL
);
CREATE INDEX IDX_NEWSFEED_TYPE ON newsfeed (newsfeed_type);

CREATE TABLE newsfeed_item
(
id varchar(32) PRIMARY NOT NULL,
item_type varchar(35) NOT NULL,
version int4 NOT NULL,
category varchar(25) NULL,
data1 bytea NULL,
data2 bytea NULL,
date_time timestamp NOT NULL,
guid1 varchar(32) NULL,
guid2 varchar(32) NULL,
guid3 varchar(32) NULL,
id1 int8 NULL,
id2 int8 NULL,
long_value1 int8 NULL,
long_value2 int8 NULL,
long_value3 int8 NULL,
string_value1 varchar(4000) NULL,
string_value2 varchar(500) NULL,
string_value3 varchar(500) NULL,
string_value4 varchar(500) NULL,
string_value5 varchar(500) NULL,
string_value6 varchar(500) NULL,
newsfeed varchar(32) NOT NULL
);
CREATE UNIQUE INDEX newsfeed_item_pkey ON newsfeed_item (id);
CREATE INDEX idx_nfi_guid1 ON newsfeed_item (guid1);
CREATE INDEX idx_nfi_guid2 ON newsfeed_item (guid2);
CREATE INDEX idx_nfi_guid3 ON newsfeed_item (guid3);
CREATE INDEX idx_nfi_id1 ON newsfeed_item (id1);
CREATE INDEX idx_nfi_id2 ON newsfeed_item (id2);
CREATE INDEX idx_nfi_newsfeed ON newsfeed_item (newsfeed);
CREATE INDEX idx_nfi_type ON newsfeed_item (item_type);
CREATE INDEX idx_nfi_datetime ON newsfeed_item (date_time);

newsfeed contains 457036 rows
newsweed_item contains 5169727 rows

postgres version: 9.0.2
OS: CentOS release 5.5 (Final)

The following query took 4.2 seconds:

-------------------------
select *
from newsfeed_item
where newsfeed in
(
'173ee4dcec0d11de9f4f12313c0018c1','10dabde0f70211df816612313b02054e',
'17841c9af70211df874b12313b02054e','1783fce2f70211df814412313b02054e','1783fdd2f70211df8c1d12313b02054e','178405a2f70211df829212313b02054e',
'178440c6f70211df97c812313b02054e','178416e6f70211dfac3412313b02054e','1783e4aaf70211df9acd12313b02054e','178437e8f70211df8b8512313b02054e',
'1783f54ef70211df81e012313b02054e','178415c4f70211df8f8112313b02054e'
)
order by date_time desc

limit 25
-------------------------

If the LIMIT was removed, the query took 60 milliseconds! If the sorting order was changed to ASC, the query took 44ms, even with the LIMIT.

Then I tried to create the index on date_time in DESC order (because the result is sorted in descending order), but that did not change anything.

Then I removed the index on date_time with the following results:

query with the limit: 40 ms
query without the limit: 60 ms

=> the optimizer seems to use a wrong index (I did perform an ANALYZE on newsfeed_item and a REINDEX before I did the test). Since I currently don't need
the index on date_time (but will need it in the near future), I removed the index on date_time, which is ok for now.

------------------------

here are the explain analyze results:

1) the query in descending order with the limit and index on date_time (the slow one):

Limit (cost=0.00..980.09 rows=25 width=963) (actual time=48.592..4060.779 rows=25 loops=1)
-> Index Scan Backward using "IDX_NFI_DATETIME" on newsfeed_item (cost=0.00..409365.16 rows=10442 width=963) (actual time=48.581..4060.542 rows=25 loops=1)
Filter: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
Total runtime: 4060.959 ms

2) the query in descending order without the limit (which is much faster):

Sort (cost=39575.23..39601.33 rows=10442 width=963) (actual time=15.014..17.038 rows=477 loops=1)
Sort Key: date_time
Sort Method: quicksort Memory: 287kB
-> Bitmap Heap Scan on newsfeed_item (cost=421.41..34450.72 rows=10442 width=963) (actual time=0.644..12.601 rows=477 loops=1)
Recheck Cond: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
-> Bitmap Index Scan on idx_nfi_newsfeed (cost=0.00..418.80 rows=10442 width=0) (actual time=0.555..0.555 rows=477 loops=1)
Index Cond: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
Total runtime: 19.065 ms

3) the query in ascending order with the limit (which is fast):

Limit (cost=0.00..980.09 rows=25 width=963) (actual time=0.261..3.704 rows=25 loops=1)
-> Index Scan using "IDX_NFI_DATETIME" on newsfeed_item (cost=0.00..409365.16 rows=10442 width=963) (actual time=0.250..3.495 rows=25 loops=1)
Filter: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
Total runtime: 3.854 ms

4) The query after removing the index on date_time, in descending order with the LIMIT (which is fast as well).

Limit (cost=34745.39..34745.45 rows=25 width=963) (actual time=12.855..13.143 rows=25 loops=1)
-> Sort (cost=34745.39..34771.49 rows=10442 width=963) (actual time=12.846..12.946 rows=25 loops=1)
Sort Key: date_time
Sort Method: top-N heapsort Memory: 40kB
-> Bitmap Heap Scan on newsfeed_item (cost=421.41..34450.72 rows=10442 width=963) (actual time=0.622..9.936 rows=477 loops=1)
Recheck Cond: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
-> Bitmap Index Scan on idx_nfi_newsfeed (cost=0.00..418.80 rows=10442 width=0) (actual time=0.543..0.543 rows=477 loops=1)
Index Cond: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))

Total runtime: 13.318 ms

Is there anything I can do to add the index on date_time without the performance problem?

regards
Dieter

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2011-04-12 05:55:03 Re: Linux: more cores = less concurrency.
Previous Message Jesper Krogh 2011-04-12 05:17:54 Re: Linux: more cores = less concurrency.