Re: heap metapages

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: heap metapages
Date: 2012-05-22 12:52:08
Message-ID: CA+TgmoamJHLB3gCgLFEMEnOBSSYS6x06Y+4cRdh88qoukR9xuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 22, 2012 at 4:52 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Based upon all you've said, I'd suggest that we make a new kind of
> fork, in a separate file for this, .meta. But we also optimise the VM
> and FSM in the way you suggest so that we can replace .vm and .fsm
> with just .meta in most cases. Big tables would get a .vm and .fsm
> appearing when they get big enough, but that won't challenge the inode
> limits. When .vm and .fsm do appear, we remove that info from the
> metapage - that means we keep all code as it is currently, accept for
> an optimisation of .vm and .fsm when those are small enough to do so.

Well, let's see. That would mean that a small heap relation has 2
forks instead of 3, and a large relation has 4 forks instead of 3. In
my proposal, a small relation has 1 fork instead of 3, and a large
relation still has 3 forks. So I like mine better.

Also, I think that we need a good chunk of the metadata here for both
tables and indexes. For example, if we use the metapage to store
information about whether a relation is logged, unlogged, being
converted from logged to unlogged, or being converted from logged to
unlogged, we need that information both for tables and for indexes.
Now, there's no absolute reason why those cases have to be handled
symmetrically, but I think things will be a lot simpler if they are.
If we settle on the rule that block 0 of every relation contains a
certain chunk of metadata at a certain byte offset, then the code to
retrieve that data when needed is pretty darn simple. If tables put
it in a separate fork and indexes put it in the main fork inside the
metablock somewhere, then things are not so simple. And I sure don't
want to add a separate fork for every index just to hold the metadata:
that would be a huge hit in terms of total inode consumption.

> We can watermark data files using special space on block zero using
> some code to sneak that in when the page is next written, but that is
> regarded as optional, rather than an essential aspect of an
> upgrade/normal operation.
>
> Having pg_upgrade touch data files is both dangerous and difficult to
> back out in case of mistake, so I am wary of putting the metapage at
> block 0. Doing it the way I suggest means the .meta files would be
> wholly new and can be deleted as a back-out. We can also clean away
> any unnecessary .vm/.fsm files as a later step.

It seems pretty clear to me that making pg_upgrade responsible for
emptying block zero is a non-starter. But I don't think that's a
reason to throw out the design; I think it's a problem we can work
around.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2012-05-22 13:04:39 Per-Database Roles
Previous Message Andrew Dunstan 2012-05-22 12:40:22 Re: Changing the concept of a DATABASE