Re: Rewriting Free Space Map

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rewriting Free Space Map
Date: 2008-03-17 13:49:29
Message-ID: 47DE76E9.8060009@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>> I've started working on revamping Free Space Map, using the approach
>> where we store a map of heap pages on every nth heap page. What we need
>> now is discussion on the details of how exactly it should work.
>
> You're cavalierly waving away a whole boatload of problems that will
> arise as soon as you start trying to make the index AMs play along
> with this :-(.

It doesn't seem very hard. An indexam wanting to use FSM needs a little
bit of code where the relation is extended, to let the FSM initialize
FSM pages. And then there's the B-tree metapage issue I mentioned. But
that's all, AFAICS.

> Hash for instance has very narrow-minded ideas about
> page allocation within its indexes.

Hash doesn't use FSM at all.

> Also, I don't think that "use the special space" will scale to handle
> other kinds of maps such as the proposed dead space map. (This is
> exactly why I said the other day that we need a design roadmap for all
> these ideas.)

It works for anything that scales linearly with the relation itself. The
proposed FSM and visibility map both fall into that category.

A separate file is certainly more flexible. I was leaning towards that
option originally
(http://archives.postgresql.org/pgsql-hackers/2007-11/msg00142.php) for
that reason.

> The idea that's becoming attractive to me while contemplating the
> multiple-maps problem is that we should adopt something similar to
> the old Mac OS idea of multiple "forks" in a relation. In addition
> to the main data fork which contains the same info as now, there could
> be one or more map forks which are separate files in the filesystem.
> They are named by relfilenode plus an extension, for instance a relation
> with relfilenode NNN would have a data fork in file NNN (plus perhaps
> NNN.1, NNN.2, etc) and a map fork named something like NNN.map (plus
> NNN.map.1 etc as needed). We'd have to add one more field to buffer
> lookup keys (BufferTag) to disambiguate which fork the referenced page
> is in. Having bitten that bullet, though, the idea trivially scales to
> any number of map forks with potentially different space requirements
> and different locking and WAL-logging requirements.

Hmm. You also need to teach at least xlog.c and xlogutils.c about the
map forks, for full page images and the invalid page tracking. I also
wonder what the performance impact of extending BufferTag is.

My original thought was to have a separate RelFileNode for each of the
maps. That would require no smgr or xlog changes, and not very many
changes in the buffer manager, though I guess you'd more catalog
changes. You had doubts about that on the previous thread
(http://archives.postgresql.org/pgsql-hackers/2007-11/msg00204.php) but
the "map forks" idea certainly seems much more invasive than that.

I like the "map forks" idea; it groups the maps nicely at the filesystem
level, and I can see it being useful for all kinds of things in the
future. The question is, is it really worth the extra code churn? If you
think it is, I can try that approach.

> Another possible advantage is that a new map fork could be added to an
> existing table without much trouble. Which is certainly something we'd
> need if we ever hope to get update-in-place working.

Yep.

> The main disadvantage I can see is that for very small tables, the
> percentage overhead from multiple map forks of one page apiece is
> annoyingly high. However, most of the point of a map disappears if
> the table is small, so we might finesse that by not creating any maps
> until the table has reached some minimum size.

Yeah, the map fork idea is actually better than the "every nth heap
page" approach from that point of view.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Lars-Erik Bjørk 2008-03-17 13:53:30 Re: Rewriting Free Space Map
Previous Message Alvaro Herrera 2008-03-17 13:41:30 Re: [0/4] Proposal of SE-PostgreSQL patches