From: | "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Rewriting Free Space Map |
Date: | 2008-03-17 13:49:29 |
Message-ID: | 47DE76E9.8060009@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>> I've started working on revamping Free Space Map, using the approach
>> where we store a map of heap pages on every nth heap page. What we need
>> now is discussion on the details of how exactly it should work.
>
> You're cavalierly waving away a whole boatload of problems that will
> arise as soon as you start trying to make the index AMs play along
> with this :-(.
It doesn't seem very hard. An indexam wanting to use FSM needs a little
bit of code where the relation is extended, to let the FSM initialize
FSM pages. And then there's the B-tree metapage issue I mentioned. But
that's all, AFAICS.
> Hash for instance has very narrow-minded ideas about
> page allocation within its indexes.
Hash doesn't use FSM at all.
> Also, I don't think that "use the special space" will scale to handle
> other kinds of maps such as the proposed dead space map. (This is
> exactly why I said the other day that we need a design roadmap for all
> these ideas.)
It works for anything that scales linearly with the relation itself. The
proposed FSM and visibility map both fall into that category.
A separate file is certainly more flexible. I was leaning towards that
option originally
(http://archives.postgresql.org/pgsql-hackers/2007-11/msg00142.php) for
that reason.
> The idea that's becoming attractive to me while contemplating the
> multiple-maps problem is that we should adopt something similar to
> the old Mac OS idea of multiple "forks" in a relation. In addition
> to the main data fork which contains the same info as now, there could
> be one or more map forks which are separate files in the filesystem.
> They are named by relfilenode plus an extension, for instance a relation
> with relfilenode NNN would have a data fork in file NNN (plus perhaps
> NNN.1, NNN.2, etc) and a map fork named something like NNN.map (plus
> NNN.map.1 etc as needed). We'd have to add one more field to buffer
> lookup keys (BufferTag) to disambiguate which fork the referenced page
> is in. Having bitten that bullet, though, the idea trivially scales to
> any number of map forks with potentially different space requirements
> and different locking and WAL-logging requirements.
Hmm. You also need to teach at least xlog.c and xlogutils.c about the
map forks, for full page images and the invalid page tracking. I also
wonder what the performance impact of extending BufferTag is.
My original thought was to have a separate RelFileNode for each of the
maps. That would require no smgr or xlog changes, and not very many
changes in the buffer manager, though I guess you'd more catalog
changes. You had doubts about that on the previous thread
(http://archives.postgresql.org/pgsql-hackers/2007-11/msg00204.php) but
the "map forks" idea certainly seems much more invasive than that.
I like the "map forks" idea; it groups the maps nicely at the filesystem
level, and I can see it being useful for all kinds of things in the
future. The question is, is it really worth the extra code churn? If you
think it is, I can try that approach.
> Another possible advantage is that a new map fork could be added to an
> existing table without much trouble. Which is certainly something we'd
> need if we ever hope to get update-in-place working.
Yep.
> The main disadvantage I can see is that for very small tables, the
> percentage overhead from multiple map forks of one page apiece is
> annoyingly high. However, most of the point of a map disappears if
> the table is small, so we might finesse that by not creating any maps
> until the table has reached some minimum size.
Yeah, the map fork idea is actually better than the "every nth heap
page" approach from that point of view.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Lars-Erik Bjørk | 2008-03-17 13:53:30 | Re: Rewriting Free Space Map |
Previous Message | Alvaro Herrera | 2008-03-17 13:41:30 | Re: [0/4] Proposal of SE-PostgreSQL patches |