Re: Slow index scan on B-Tree index over timestamp field

From: Caio Casimiro <casimiro(dot)listas(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow index scan on B-Tree index over timestamp field
Date: 2013-11-04 20:44:05
Message-ID: CAK42QYHJif8Qek=oYnwxYcWhJC7GHAqcEPTgP6xq__8TxRNz_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thank you very much for your answers guys!

On Mon, Nov 4, 2013 at 5:15 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Sun, Nov 3, 2013 at 2:05 PM, Caio Casimiro <casimiro(dot)listas(at)gmail(dot)com>
> wrote:
>
>> Hello all,
>>
>> I have one query running at ~ 7 seconds and I would like to know if it's
>> possible to make it run faster, once this query runs lots of time in my
>> experiment.
>>
>
>
> Do you mean you want it to be fast because it runs many times, or that you
> want it to become fast after it runs many times (i.e. once the data is
> fully cached)? The plan you show takes 24 seconds, not 7 seconds.
>

I want it to be fast because it runs many times. I have an experiment that
evaluates recommendation algorithms for a set of twitter users. This query
returns recommendation candidates so it is called a lot of times for
different users and time intervals.

>
>
>>
>> Basically the query return the topics of tweets published by users that
>> the user N follows and that are published between D1 and D2.
>>
>> *Query*:
>>
>> SELECT tt.tweet_id, tt.topic, tt.topic_value
>> FROM tweet_topic AS tt LEFT JOIN tweet AS t ON tt.tweet_id =
>> t.id
>> WHERE creation_time BETWEEN 'D1' AND 'D2' AND user_id in
>> (SELECT followed_id FROM relationship WHERE follower_id = N)
>> ORDER BY tt.tweet_id;
>>
>
>
> I don't know if this affects the plan at all, but it is silly to do a left
> join to "tweet" when the WHERE clause has conditions that can't be
> satisfied with a null row. Also, you could try changing the IN-list to an
> EXISTS subquery.
>

I'm sorry the ignorance, but I don't understand the issue with the left
join, could you explain more?

> Is there some patterns to D1 and D2 that could help the caching? For
> example, are they both usually in the just-recent past?
>
The only pattern is that it is always a one day interval, e.g. D1 =
'2013-05-01' and D2 = '2013-05-02'.

>
>
> Indexes:
>> "tweet_plk" PRIMARY KEY, btree (id) CLUSTER
>> "tweet_creation_time_index" btree (creation_time)
>> "tweet_id_index" hash (id)
>> "tweet_ios_index" btree (id, user_id, creation_time)
>> "tweet_retweeted_idx" hash (retweeted)
>> "tweet_user_id_creation_time_index" btree (creation_time, user_id)
>> "tweet_user_id_index" hash (user_id)
>>
>
>
> Are all of those indexes important? If your table is heavily
> updated/inserted, which I assume it is, maintaining those indexes is going
> to take up precious RAM that could probably be better used elsewhere.
>

Probably not. But once this database is read only, the quantity of index
grew following my desperation. =)

>
> Cheers,
>
> Jeff
>

Thank you very much again!
Caio

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Caio Casimiro 2013-11-04 20:50:25 Re: Slow index scan on B-Tree index over timestamp field
Previous Message Jeff Janes 2013-11-04 19:15:32 Re: Slow index scan on B-Tree index over timestamp field