Re: [RFC] Minmax indexes

From: Jim Nasby <jim(at)nasby(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] Minmax indexes
Date: 2013-06-28 17:18:52
Message-ID: 51CDC57C.9020904@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/17/13 3:38 PM, Josh Berkus wrote:
>>> Why? Why can't we just update the affected pages in the index?
>> >
>> >The page range has to be scanned in order to find out the min/max values
>> >for the indexed columns on the range; and then, with these data, update
>> >the index.
> Seems like you could incrementally update the range, at least for
> inserts. If you insert a row which doesn't decrease the min or increase
> the max, you can ignore it, and if it does increase/decrease, you can
> change the min/max. No?
>
> For updates, things are more complicated. If the row you're updating
> was the min/max, in theory you should update it to adjust that, but you
> can't verify that it was the ONLY min/max row without doing a full scan.
> My suggestion would be to add a "dirty" flag which would indicate that
> that block could use a rescan next VACUUM, and otherwise ignore changing
> the min/max. After all, the only defect to having min to low or max too
> high for a block would be scanning too many blocks. Which you'd do
> anyway with it marked "invalid".

If we add a dirty flag it would probably be wise to allow for more than one value so we can do a clock-sweep. That would allow for detecting a range that is getting dirtied repeatedly and not bother to try and re-summarize it until later.

Something else I don't think was mentioned... re-summarization should be somehow tied to access activity: if a query will need to seqscan a segment that needs to be summarized, we should take that opportunity to summarize at the same time while pages are in cache. Maybe that can be done in the backend itself; maybe we'd want a separate process.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2013-06-28 17:26:31 Re: [RFC] Minmax indexes
Previous Message Simon Riggs 2013-06-28 16:52:39 Re: changeset generation v5-01 - Patches & git tree