Re: Slow index scan on B-Tree index over timestamp field

From: Elliot <yields(dot)falsehood(at)gmail(dot)com>
To: Caio Casimiro <casimiro(dot)listas(at)gmail(dot)com>, Igor Neyman <ineyman(at)perceptron(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow index scan on B-Tree index over timestamp field
Date: 2013-11-04 21:22:47
Message-ID: 52781027.7080307@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 2013-11-04 16:10, Caio Casimiro wrote:
> Hi Neyman, thank you for your answer.
>
> Unfortunately this query runs almost at the same time:
>
> Sort (cost=4877693.98..4877702.60 rows=3449 width=20) (actual
> time=25820.291..25821.845 rows=1640 loops=1)
> Sort Key: tt.tweet_id
> Sort Method: quicksort Memory: 97kB
> Buffers: shared hit=1849 read=32788
> -> Nested Loop (cost=247.58..4877491.32 rows=3449 width=20)
> (actual time=486.839..25814.120 rows=1640 loops=1)
> Buffers: shared hit=1849 read=32788
> -> Hash Semi Join (cost=229.62..88553.23 rows=1681 width=8)
> (actual time=431.654..13209.159 rows=597 loops=1)
> Hash Cond: (t.user_id = relationship.followed_id)
> Buffers: shared hit=3 read=31870
> -> Index Scan using tweet_creation_time_index on tweet
> t (cost=0.57..83308.25 rows=1781234 width=16) (actual
> time=130.144..10037.764 rows=1759645 loops=1)
> Index Cond: ((creation_time >= '2013-05-05
> 00:00:00-03'::timestamp with time zone) AND (creation_time <=
> '2013-05-06 00:00:00-03'::timestamp with time zone))
> Buffers: shared hit=1 read=31867
> -> Hash (cost=227.12..227.12 rows=154 width=8) (actual
> time=94.365..94.365 rows=106 loops=1)
> Buckets: 1024 Batches: 1 Memory Usage: 3kB
> Buffers: shared hit=2 read=3
> -> Index Only Scan using relationship_id on
> relationship (cost=0.42..227.12 rows=154 width=8) (actual
> time=74.540..94.101 rows=106 loops=1)
> Index Cond: (follower_id = 335093362)
> Heap Fetches: 0
> Buffers: shared hit=2 read=3
> -> Bitmap Heap Scan on tweet_topic tt (cost=17.96..2841.63
> rows=723 width=20) (actual time=21.014..21.085 rows=3 loops=597)
> Recheck Cond: (tweet_id = t.id <http://t.id>)
> Buffers: shared hit=1846 read=918
> -> Bitmap Index Scan on tweet_topic_pk
> (cost=0.00..17.78 rows=723 width=0) (actual time=15.012..15.012
> rows=3 loops=597)
> Index Cond: (tweet_id = t.id <http://t.id>)
> Buffers: shared hit=1763 read=632
> Total runtime: 25823.386 ms
>
> I have noticed that in both queries the index scan on
> tweet_creation_time_index is very expensive. Is there anything I can
> do to make the planner choose a index only scan?
>
>
Yes, because that part of the query is kicking back so many rows, many
of which are totally unnecessary anyway - you're first getting all the
tweets in a particular time range, then limiting them down to just users
that are followed. Here's clarification on the approach I mentioned
earlier. All you should really need are basic (btree) indexes on your
different keys (tweet_topic.tweet_id, tweet.id, tweet.user_id,
relationship.follower_id, relationship.followed_id). I also changed the
left join to an inner join as somebody pointed out that your logic
amounted to reducing the match to an inner join anyway.

SELECT tt.tweet_id, tt.topic, tt.topic_value
FROM tweet_topic AS tt
JOIN tweet AS t
ON tt.tweet_id = t.id
join relationship
on t.user_id = relationship.followed_id
WHERE creation_time BETWEEN 'D1' AND 'D2'
AND relationship.follower_id = N
ORDER BY tt.tweet_id
;

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Igor Neyman 2013-11-04 21:26:09 Re: Slow index scan on B-Tree index over timestamp field
Previous Message Caio Casimiro 2013-11-04 21:10:15 Re: Slow index scan on B-Tree index over timestamp field