BUG #8943: optimizer appears to not be efficient when there is little data in the query results

From: nghia(dot)le(at)postano(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #8943: optimizer appears to not be efficient when there is little data in the query results
Date: 2014-01-24 01:03:52
Message-ID: 20140124010352.26703.57438@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 8943
Logged by: Nghia Le
Email address: nghia(dot)le(at)postano(dot)com
PostgreSQL version: 9.3.1
Operating system: Amazon RDS
Description:

I think the issue is The issue is we order by post_time, scraped time. and
for the most part when there is a lot of data, the composite index created
works wonders. However when there is little data
(ie feed_id =8924 ) has about 54 items. Then it doesn't know to just use a
sequence scan, instead it uses the entire index and takes forever in doing
so.

Query Plan with Index_scan off: Limit (cost=154847.18..154847.18 rows=1
width=684) (actual time=1.001..1.002 rows=1 loops=1)
Output: p.id, p.guid, p.source_type, p.post_time, p.source_id, p.title,
p.url, p.picture_url, p.video_url, p.media_mime_type, p.media_height,
p.media_width, p.text, p.user_icon_url, p.user_id, p.user_displayname,
p.user_fullname, p.text_config_name, p.feed_id, p.scraped_time
Buffers: shared hit=119 read=4
-> Sort (cost=154847.18..154877.28 rows=12038 width=684) (actual
time=0.998..0.998 rows=1 loops=1)
Output: p.id, p.guid, p.source_type, p.post_time, p.source_id,
p.title, p.url, p.picture_url, p.video_url, p.media_mime_type,
p.media_height, p.media_width, p.text, p.user_icon_url, p.user_id,
p.user_displayname, p.user_fullname, p.text_config_name, p.feed_id,
p.scraped_time
Sort Key: p.post_time, p.scraped_time
Sort Method: top-N heapsort Memory: 26kB
Buffers: shared hit=119 read=4
-> Nested Loop Anti Join (cost=294.53..154786.99 rows=12038
width=684) (actual time=0.073..0.927 rows=23 loops=1)
Output: p.id, p.guid, p.source_type, p.post_time,
p.source_id, p.title, p.url, p.picture_url, p.video_url, p.media_mime_type,
p.media_height, p.media_width, p.text, p.user_icon_url, p.user_id,
p.user_displayname, p.user_fullname, p.text_config_name, p.feed_id,
p.scraped_time
Buffers: shared hit=119 read=4
-> Bitmap Heap Scan on public.post p (cost=290.11..48544.89
rows=12586 width=684) (actual time=0.042..0.583 rows=23 loops=1)
Output: p.id, p.guid, p.scraped_time, p.source_type,
p.post_time, p.source_id, p.title, p.url, p.picture_url, p.video_url,
p.media_mime_type, p.media_height, p.media_width, p.text, p.user_icon_url,
p.user_id, p.user_displayname, p.user_fullname, p.text_config_name,
p.feed_id, p.tsv, p.original_source_id
Recheck Cond: (p.feed_id = 8924)
Buffers: shared hit=5 read=3
-> Bitmap Index Scan on feed_id_idx
(cost=0.00..286.96 rows=12586 width=0) (actual time=0.033..0.033 rows=23
loops=1)
Index Cond: (p.feed_id = 8924)
Buffers: shared hit=3 read=1
-> Bitmap Heap Scan on public.post p1 (cost=4.42..8.44
rows=1 width=8) (actual time=0.009..0.009 rows=0 loops=23)
Output: p1.id, p1.guid, p1.scraped_time,
p1.source_type, p1.post_time, p1.source_id, p1.title, p1.url,
p1.picture_url, p1.video_url, p1.media_mime_type, p1.media_height,
p1.media_width, p1.text, p1.user_icon_url, p1.user_id, p1.user_displayname,
p1.user_fullname, p1.text_config_name, p1.feed_id, p1.tsv,
p1.original_source_id
Recheck Cond: (p1.id = p.id)
Filter: ((p1.original_source_id IS NOT NULL) AND
((p1.source_type)::text = ANY
('{twitter_stream,twitter_search,twitter_lists,twitter_user,twitter_hashtag}'::text[])))
Rows Removed by Filter: 1
Buffers: shared hit=114 read=1
-> Bitmap Index Scan on post_pkey (cost=0.00..4.42
rows=1 width=0) (actual time=0.006..0.006 rows=1 loops=23)
Index Cond: (p1.id = p.id)
Buffers: shared hit=91 read=1
Total runtime: 1.078 ms

---------
Query Scan with Index On:

Limit (cost=1.13..61.38 rows=5 width=684) (actual time=0.058..0.110 rows=5
loops=1)
Output: p.id, p.guid, p.source_type, p.post_time, p.source_id, p.title,
p.url, p.picture_url, p.video_url, p.media_mime_type, p.media_height,
p.media_width, p.text, p.user_icon_url, p.user_id, p.user_displayname,
p.user_fullname, p.text_config_name, p.feed_id, p.scraped_time
Buffers: shared hit=28 read=3
-> Nested Loop Anti Join (cost=1.13..144978.35 rows=12032 width=684)
(actual time=0.055..0.098 rows=5 loops=1)
Output: p.id, p.guid, p.source_type, p.post_time, p.source_id,
p.title, p.url, p.picture_url, p.video_url, p.media_mime_type,
p.media_height, p.media_width, p.text, p.user_icon_url, p.user_id,
p.user_displayname, p.user_fullname, p.text_config_name, p.feed_id,
p.scraped_time
Buffers: shared hit=28 read=3
-> Index Scan using feed_id_idx on public.post p
(cost=0.56..38902.40 rows=12580 width=684) (actual time=0.028..0.043 rows=5
loops=1)
Output: p.id, p.guid, p.scraped_time, p.source_type,
p.post_time, p.source_id, p.title, p.url, p.picture_url, p.video_url,
p.media_mime_type, p.media_height, p.media_width, p.text, p.user_icon_url,
p.user_id, p.user_displayname, p.user_fullname, p.text_config_name,
p.feed_id, p.tsv, p.original_source_id
Index Cond: (p.feed_id = 8924)
Buffers: shared hit=4 read=2
-> Index Scan using post_pkey on public.post p1 (cost=0.56..8.43
rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=5)
Output: p1.id, p1.guid, p1.scraped_time, p1.source_type,
p1.post_time, p1.source_id, p1.title, p1.url, p1.picture_url, p1.video_url,
p1.media_mime_type, p1.media_height, p1.media_width, p1.text,
p1.user_icon_url, p1.user_id, p1.user_displayname, p1.user_fullname,
p1.text_config_name, p1.feed_id, p1.tsv, p1.original_source_id
Index Cond: (p1.id = p.id)
Filter: ((p1.original_source_id IS NOT NULL) AND
((p1.source_type)::text = ANY
('{twitter_stream,twitter_search,twitter_lists,twitter_user,twitter_hashtag}'::text[])))
Rows Removed by Filter: 1
Buffers: shared hit=24 read=1
Total runtime: 0.186 ms

original query:

explain(analyze,buffers,verbose) SELECT p.id, p.guid, source_type,
post_time, source_id, title, url, picture_url, video_url, media_mime_type,
media_height, media_width, text, user_icon_url, user_id, user_displayname,
user_fullname, text_config_name, feed_id,scraped_time AS t FROM post AS p
LEFT JOIN location AS l on l.post_id=p.id WHERE ( feed_id =8924 ) AND NOT
EXISTS( SELECT 1 FROM post p1 WHERE p1.id=p.id AND p1.source_type
IN('twitter_stream','twitter_search', 'twitter_lists', 'twitter_user',
'twitter_hashtag') AND p1.original_source_id IS NOT NULL) ORDER BY
post_time DESC, scraped_time DESC LIMIT 1;

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2014-01-24 03:05:35 Re: BUG #8218: Error when querying an JSON data, 9.3beta
Previous Message Tom Lane 2014-01-23 20:23:54 Re: BUG #8939: Cast to range type fails due to whitespaces in the text string