Re: BUG #2658: Query not using index

From: Graham Davis <gdavis(at)refractions(dot)net>
To: Mark Lewis <mark(dot)lewis(at)mir3(dot)com>
Cc: Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: BUG #2658: Query not using index
Date: 2006-10-03 21:35:45
Message-ID: 4522D7B1.3040008@refractions.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-performance

Not many. It fluctuates, but there are usually only ever a few hundred
at most. Each assetid has multi-millions of positions though.

Mark Lewis wrote:

>Hmmm. How many distinct assetids are there?
>-- Mark Lewis
>
>On Tue, 2006-10-03 at 14:23 -0700, Graham Davis wrote:
>
>
>>The "summary table" approach maintained by triggers is something we are
>>considering, but it becomes a bit more complicated to implement.
>>Currently we have groups of new positions coming in every few seconds or
>>less. They are not guaranteed to be in order. So for instance, a group
>>of positions from today could come in and be inserted, then a group of
>>positions that got lost from yesterday could come in and be inserted
>>afterwards.
>>
>>This means the triggers would have to do some sort of logic to figure
>>out if the newly inserted position is actually the most recent by
>>timestamp. If positions are ever deleted or updated, the same sort of
>>query that is currently running slow will need to be executed in order
>>to get the new most recent position. So there is the possibility that
>>new positions can be inserted faster than the triggers can calculate
>>and maintain the summary table. There are some other complications
>>with maintaining such a summary table in our system too, but I won't get
>>into those.
>>
>>Right now I'm just trying to see if I can get the query itself running
>>faster, which would be the easiest solution for now.
>>
>>Graham.
>>
>>
>>Mark Lewis wrote:
>>
>>
>>
>>>Have you looked into a materialized view sort of approach? You could
>>>create a table which had assetid as a primary key, and max_ts as a
>>>column. Then use triggers to keep that table up to date as rows are
>>>added/updated/removed from the main table.
>>>
>>>This approach would only make sense if there were far fewer distinct
>>>assetid values than rows in the main table, and would get slow if you
>>>commonly delete rows from the main table or decrease the value for ts in
>>>the row with the highest ts for a given assetid.
>>>
>>>-- Mark Lewis
>>>
>>>On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote:
>>>
>>>
>>>
>>>
>>>>Thanks Tom, that explains it and makes sense. I guess I will have to
>>>>accept this query taking 40 seconds, unless I can figure out another way
>>>>to write it so it can use indexes. If there are any more syntax
>>>>suggestions, please pass them on. Thanks for the help everyone.
>>>>
>>>>Graham.
>>>>
>>>>
>>>>Tom Lane wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>Graham Davis <gdavis(at)refractions(dot)net> writes:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>How come an aggreate like that has to use a sequential scan? I know
>>>>>>that PostgreSQL use to have to do a sequential scan for all aggregates,
>>>>>>but there was support added to version 8 so that aggregates would take
>>>>>>advantage of indexes.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>Not in a GROUP BY context, only for the simple case. Per the comment in
>>>>>planagg.c:
>>>>>
>>>>> * We don't handle GROUP BY, because our current implementations of
>>>>> * grouping require looking at all the rows anyway, and so there's not
>>>>> * much point in optimizing MIN/MAX.
>>>>>
>>>>>The problem is that using an index to obtain the maximum value of ts for
>>>>>a given value of assetid is not the same thing as finding out what all
>>>>>the distinct values of assetid are.
>>>>>
>>>>>This could possibly be improved but it would take a considerable amount
>>>>>more work. It's definitely not in the category of "bug fix".
>>>>>
>>>>> regards, tom lane
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>

--
Graham Davis
Refractions Research Inc.
gdavis(at)refractions(dot)net

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Adnan DURSUN 2006-10-03 21:49:12 PostgreSQL Caching
Previous Message Mark Lewis 2006-10-03 21:34:27 Re: BUG #2658: Query not using index

Browse pgsql-performance by date

  From Date Subject
Next Message Adnan DURSUN 2006-10-03 21:49:12 PostgreSQL Caching
Previous Message Mark Lewis 2006-10-03 21:34:27 Re: BUG #2658: Query not using index