Re: Slow index scan on B-Tree index over timestamp field

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Caio Casimiro <casimiro(dot)listas(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow index scan on B-Tree index over timestamp field
Date: 2013-11-04 19:15:32
Message-ID: CAMkU=1wdVQvcCbJK_x+cNAO2imaZGx1DE2UBSQ3hM7LqUyZvSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, Nov 3, 2013 at 2:05 PM, Caio Casimiro <casimiro(dot)listas(at)gmail(dot)com>wrote:

> Hello all,
>
> I have one query running at ~ 7 seconds and I would like to know if it's
> possible to make it run faster, once this query runs lots of time in my
> experiment.
>

Do you mean you want it to be fast because it runs many times, or that you
want it to become fast after it runs many times (i.e. once the data is
fully cached)? The plan you show takes 24 seconds, not 7 seconds.

>
> Basically the query return the topics of tweets published by users that
> the user N follows and that are published between D1 and D2.
>
> *Query*:
>
> SELECT tt.tweet_id, tt.topic, tt.topic_value
> FROM tweet_topic AS tt LEFT JOIN tweet AS t ON tt.tweet_id =
> t.id
> WHERE creation_time BETWEEN 'D1' AND 'D2' AND user_id in
> (SELECT followed_id FROM relationship WHERE follower_id = N)
> ORDER BY tt.tweet_id;
>

I don't know if this affects the plan at all, but it is silly to do a left
join to "tweet" when the WHERE clause has conditions that can't be
satisfied with a null row. Also, you could try changing the IN-list to an
EXISTS subquery.

Is there some patterns to D1 and D2 that could help the caching? For
example, are they both usually in the just-recent past?

Indexes:
> "tweet_plk" PRIMARY KEY, btree (id) CLUSTER
> "tweet_creation_time_index" btree (creation_time)
> "tweet_id_index" hash (id)
> "tweet_ios_index" btree (id, user_id, creation_time)
> "tweet_retweeted_idx" hash (retweeted)
> "tweet_user_id_creation_time_index" btree (creation_time, user_id)
> "tweet_user_id_index" hash (user_id)
>

Are all of those indexes important? If your table is heavily
updated/inserted, which I assume it is, maintaining those indexes is going
to take up precious RAM that could probably be better used elsewhere.

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Caio Casimiro 2013-11-04 20:44:05 Re: Slow index scan on B-Tree index over timestamp field
Previous Message Elliot 2013-11-04 19:03:45 Re: Slow index scan on B-Tree index over timestamp field