Re: Slow index scan on B-Tree index over timestamp field

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Caio Casimiro <casimiro(dot)listas(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow index scan on B-Tree index over timestamp field
Date: 2013-11-06 19:00:45
Message-ID: CAMkU=1wU4trachr2jnTaZMbV97KQEMFe91C57A7qXX6MoRg+sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Nov 4, 2013 at 12:44 PM, Caio Casimiro <casimiro(dot)listas(at)gmail(dot)com>wrote:

> Thank you very much for your answers guys!
>
>
> On Mon, Nov 4, 2013 at 5:15 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
>> On Sun, Nov 3, 2013 at 2:05 PM, Caio Casimiro <casimiro(dot)listas(at)gmail(dot)com>
>> wrote:
>>
>>>
>>> SELECT tt.tweet_id, tt.topic, tt.topic_value
>>> FROM tweet_topic AS tt LEFT JOIN tweet AS t ON tt.tweet_id =
>>> t.id
>>> WHERE creation_time BETWEEN 'D1' AND 'D2' AND user_id in
>>> (SELECT followed_id FROM relationship WHERE follower_id = N)
>>> ORDER BY tt.tweet_id;
>>>
>>
>>
>> I don't know if this affects the plan at all, but it is silly to do a
>> left join to "tweet" when the WHERE clause has conditions that can't be
>> satisfied with a null row. Also, you could try changing the IN-list to an
>> EXISTS subquery.
>>
>
> I'm sorry the ignorance, but I don't understand the issue with the left
> join, could you explain more?
>

A left join means you are telling it to make up an all-NULL tweet row for
any tweet_topic that does not have a corresponding tweet. But then once it
did so, it would just filter out that row later, because the null
creation_time and user_id cannot pass the WHERE criteria--so doing a left
join can't change the answer, but it can fool the planner into making a
worse choice.

>
>
>> Is there some patterns to D1 and D2 that could help the caching? For
>> example, are they both usually in the just-recent past?
>>
> The only pattern is that it is always a one day interval, e.g. D1 =
> '2013-05-01' and D2 = '2013-05-02'.
>

If you only compare creation_time to dates, rather than ever using
date+time, then it would probably be better to store them in the table as
date, not timestamp. This might make the index smaller, and can also lead
to better estimates and index usage.

But why would you want to offer suggestions to someone based on tweets that
were made on exactly one day, over 5 months ago? I can see why would want
a brief period in the immediate past, or a long period; but a brief period
that is not the recent past just seems like a strange thing to want to do.
(And it is going to be hard to get good performance with that requirement.)

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Caio Casimiro 2013-11-06 19:24:24 Re: Slow index scan on B-Tree index over timestamp field
Previous Message Scott Marlowe 2013-11-06 17:04:41 Re: postgresql recommendation memory