Slow index scan on B-Tree index over timestamp field

From: Caio Casimiro <casimiro(dot)listas(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Slow index scan on B-Tree index over timestamp field
Date: 2013-11-03 22:05:16
Message-ID: CAK42QYGWHKk+5HZAW4+z1+A2xMYUd1KmwsXp9OVtSud7ztA=4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello all,

I have one query running at ~ 7 seconds and I would like to know if it's
possible to make it run faster, once this query runs lots of time in my
experiment.

Basically the query return the topics of tweets published by users that the
user N follows and that are published between D1 and D2.

*Query*:

SELECT tt.tweet_id, tt.topic, tt.topic_value
FROM tweet_topic AS tt LEFT JOIN tweet AS t ON tt.tweet_id =
t.id
WHERE creation_time BETWEEN 'D1' AND 'D2' AND user_id in
(SELECT followed_id FROM relationship WHERE follower_id = N)
ORDER BY tt.tweet_id;

*Explain (Analyze, Buffers):*

Sort (cost=3950701.24..3950708.22 rows=2793 width=20) (actual
time=24062.951..24064.475 rows=1640 loops=1)
Sort Key: tt.tweet_id
Sort Method: quicksort Memory: 97kB
Buffers: shared hit=2390 read=32778
I/O Timings: read=15118.402
-> Nested Loop (cost=247.58..3950541.38 rows=2793 width=20) (actual
time=532.578..24057.319 rows=1640 loops=1)
Buffers: shared hit=2387 read=32778
I/O Timings: read=15118.402
-> Hash Semi Join (cost=229.62..73239.03 rows=1361 width=8)
(actual time=391.768..15132.889 rows=597 loops=1)
Hash Cond: (t.user_id = relationship.followed_id)
Buffers: shared hit=539 read=31862
I/O Timings: read=6265.279
-> Index Scan using tweet_creation_time_index on tweet t
(cost=0.57..68869.39 rows=1472441 width=16) (actual time=82.752..11418.043
rows=175
9645 loops=1)
Index Cond: ((creation_time >= '2013-05-05
00:00:00-03'::timestamp with time zone) AND (creation_time <= '2013-05-06
00:00:00-03'::times
tamp with time zone))
Buffers: shared hit=534 read=31859
I/O Timings: read=6193.764
-> Hash (cost=227.12..227.12 rows=154 width=8) (actual
time=72.175..72.175 rows=106 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 3kB
Buffers: shared hit=5 read=3
I/O Timings: read=71.515
-> Index Only Scan using relationship_id on
relationship (cost=0.42..227.12 rows=154 width=8) (actual
time=59.395..71.972 rows=106 loo
ps=1)
Index Cond: (follower_id = 335093362)
Heap Fetches: 0
Buffers: shared hit=5 read=3
I/O Timings: read=71.515
-> Bitmap Heap Scan on tweet_topic tt (cost=17.96..2841.63
rows=723 width=20) (actual time=14.909..14.917 rows=3 loops=597)
Recheck Cond: (tweet_id = t.id)
Buffers: shared hit=1848 read=916
I/O Timings: read=8853.123
-> Bitmap Index Scan on tweet_topic_pk (cost=0.00..17.78
rows=723 width=0) (actual time=9.793..9.793 rows=3 loops=597)
Index Cond: (tweet_id = t.id)
Buffers: shared hit=1764 read=631
I/O Timings: read=5811.532
Total runtime: 24066.145 ms
(34 rows)

*Table structure*:

Table "public.tweet"
Column | Type | Modifiers |
Storage | Stats target | Description
-----------------------+--------------------------------------+--------------+-------------+------------------+-----------------
id | bigint | not null
| plain | |
content | text |
| extended | |
creation_time | timestamp with time zone | | plain |
|
user_id | bigint |
| plain | |
retweeted | bigint | |
plain | |
retweet_count | integer | |
plain | |
Indexes:
"tweet_plk" PRIMARY KEY, btree (id) CLUSTER
"tweet_creation_time_index" btree (creation_time)
"tweet_id_index" hash (id)
"tweet_ios_index" btree (id, user_id, creation_time)
"tweet_retweeted_idx" hash (retweeted)
"tweet_user_id_creation_time_index" btree (creation_time, user_id)
"tweet_user_id_index" hash (user_id)

*System Information*:
OS: Slackware 14.0
Postgresql Version: *9.3 Beta2*

*postgresql.conf Settings:*

work_mem = 128MB
shared_buffers = 1GB
maintenance_work_mem = 1536MB
fsync = off
synchronous_commit = off
effective_cache_size = 2GB

*Additional information:*

All tables in this database are read only tables. I haven't post the
details about other tables to not let the email big, as it seems the
problem is with the 'tweet' table.

Any help would be much appreciated.
Best regards,
Caio Casimiro.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2013-11-04 18:56:01 Re: Slow index scan on B-Tree index over timestamp field
Previous Message Tomas Vondra 2013-11-02 19:40:23 Re: Hot Standby performance issue