From: | Caio Casimiro <casimiro(dot)listas(at)gmail(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Slow index scan on B-Tree index over timestamp field |
Date: | 2013-11-04 20:44:05 |
Message-ID: | CAK42QYHJif8Qek=oYnwxYcWhJC7GHAqcEPTgP6xq__8TxRNz_A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Thank you very much for your answers guys!
On Mon, Nov 4, 2013 at 5:15 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Sun, Nov 3, 2013 at 2:05 PM, Caio Casimiro <casimiro(dot)listas(at)gmail(dot)com>
> wrote:
>
>> Hello all,
>>
>> I have one query running at ~ 7 seconds and I would like to know if it's
>> possible to make it run faster, once this query runs lots of time in my
>> experiment.
>>
>
>
> Do you mean you want it to be fast because it runs many times, or that you
> want it to become fast after it runs many times (i.e. once the data is
> fully cached)? The plan you show takes 24 seconds, not 7 seconds.
>
I want it to be fast because it runs many times. I have an experiment that
evaluates recommendation algorithms for a set of twitter users. This query
returns recommendation candidates so it is called a lot of times for
different users and time intervals.
>
>
>>
>> Basically the query return the topics of tweets published by users that
>> the user N follows and that are published between D1 and D2.
>>
>> *Query*:
>>
>> SELECT tt.tweet_id, tt.topic, tt.topic_value
>> FROM tweet_topic AS tt LEFT JOIN tweet AS t ON tt.tweet_id =
>> t.id
>> WHERE creation_time BETWEEN 'D1' AND 'D2' AND user_id in
>> (SELECT followed_id FROM relationship WHERE follower_id = N)
>> ORDER BY tt.tweet_id;
>>
>
>
> I don't know if this affects the plan at all, but it is silly to do a left
> join to "tweet" when the WHERE clause has conditions that can't be
> satisfied with a null row. Also, you could try changing the IN-list to an
> EXISTS subquery.
>
I'm sorry the ignorance, but I don't understand the issue with the left
join, could you explain more?
> Is there some patterns to D1 and D2 that could help the caching? For
> example, are they both usually in the just-recent past?
>
The only pattern is that it is always a one day interval, e.g. D1 =
'2013-05-01' and D2 = '2013-05-02'.
>
>
> Indexes:
>> "tweet_plk" PRIMARY KEY, btree (id) CLUSTER
>> "tweet_creation_time_index" btree (creation_time)
>> "tweet_id_index" hash (id)
>> "tweet_ios_index" btree (id, user_id, creation_time)
>> "tweet_retweeted_idx" hash (retweeted)
>> "tweet_user_id_creation_time_index" btree (creation_time, user_id)
>> "tweet_user_id_index" hash (user_id)
>>
>
>
> Are all of those indexes important? If your table is heavily
> updated/inserted, which I assume it is, maintaining those indexes is going
> to take up precious RAM that could probably be better used elsewhere.
>
Probably not. But once this database is read only, the quantity of index
grew following my desperation. =)
>
> Cheers,
>
> Jeff
>
Thank you very much again!
Caio
From | Date | Subject | |
---|---|---|---|
Next Message | Caio Casimiro | 2013-11-04 20:50:25 | Re: Slow index scan on B-Tree index over timestamp field |
Previous Message | Jeff Janes | 2013-11-04 19:15:32 | Re: Slow index scan on B-Tree index over timestamp field |