Re: unlogged tables

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables
Date: 2010-11-14 02:35:01
Message-ID: AANLkTimxoBCG3suEF3nbo-4sxsr_RPXUAQKUOwJx_Vkx@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 13, 2010 at 9:17 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> On Sun, Nov 14, 2010 at 1:15 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Cleanup at first connection is something we've been avoiding for years,
>> but maybe it's time to bite the bullet and do that?
>
> Another alternative is to initialize the unlogged tables when you
> first access them. If you try to open a table and there are no files
> attached them go ahead and initialize it by creating an empty table
> and building any indexes.

I thought about that (I've thought about a lot of things in regards to
this feature...). One problem is that you presumably will need to
open the relation before you can decide whether this is the first
access since restart. But by the time you've opened them, you've
already taken an AccessShareLock, and you'll presumably need something
a whole lot stronger than that to do the rebuild. Lock upgrades are
usually a good thing to avoid when possible, although maybe it would
be OK in this case, not sure. Another problem is that it's not too
clear to me where you'd hook in the logic to do the cleanup. The
relcache code seems like an awfully low-level place to be trying to
perpetrate this sort of monkey business.

> Hm, I had been assuming recovery would be responsible for cleaning up
> the tables even if the first access is responsible for rebuilding
> them. But there's a chance there have been no modifications to them
> since the last checkpoint. But in that case the data in them is fine.
> It would be a weird interface if it only cleared them out sometimes
> based on unpredictable timing though. Avoiding that does require some
> kind of alternate storage scheme other than the WAL to indicate what
> needs to be cleared out. .init files are as good a mechanism even if
> they just mean "unlink this file on startup".

One idea I had was to trigger the rebuild when we notice that the main
relation fork is missing. Then the startup code can just notice the
init fork, annihilate everything else, and call it good. However, this
appears to require modifying some fairly fundamental assumptions of
the current system. smgr.c/md.c believe that nobody should ever try
to read a nonexistent block, and unconditionally throw an error if the
caller tries to do so. You could provide a mode where they don't do
that, and instead return an error indication to the caller. Then you
could add an additional ReadBuffer mode, say RBM_FAIL, to let the
error percolate back up through that layer to the index AM or heap
code, which could then try to upgrade its lock and recreate the main
fork. However, I really couldn't work up much enthusiasm for
implementing this feature in a way that requires drilling a hole in
the abstraction stack from top to bottom.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2010-11-14 02:45:00 Re: wCTE behaviour
Previous Message Robert Haas 2010-11-14 02:19:46 Re: Label switcher function