Re: Unlogged tables, persistent kind

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unlogged tables, persistent kind
Date: 2011-04-25 18:39:10
Message-ID: BANLkTi=ysNC0RFTAoWd+oMeZ-kUTSUn2RQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 25, 2011 at 2:21 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> Right, but the trick is how you identify which blocks you need to
>> zero.  You used the word "damaged", which to me implied that the block
>> had been modified in some way but ended up with other than the
>> expected contents, so that something like a CRC check might detect the
>> problem.  My point (as perhaps you already understand) is that you
>> could easily have a situation where every block in the table passes a
>> hypothetical block-level CRC check, but the table as a whole is still
>> damaged because update chains aren't coherent.  So you need some kind
>> of mechanism for identifying which portions of the table you need to
>> zero to get back to a guaranteed-coherent state.
>
> That sounds like progress.
>
> The current mechanism is "truncate complete table". There are clearly
> other mechanisms that would not remove all data.

No doubt. Consider a block B. If the system crashes when block B is
dirty either in the OS cache or shared_buffers, then you must zero B,
or truncate it away. If it was clean in both places, however, it's
good data and you can keep it.

So you can imagine for example a scheme where imagine that the
relation is divided into 8MB chunks, and we WAL-log the first
operation after each checkpoint that touches a chunk. Replay zeroes
the chunk, and we also invalidate all the indexes (the user must
REINDEX to get them working again). I think that would be safe, and
certainly the WAL-logging overhead would be far less than WAL-logging
every change, since we'd need to emit only ~16 bytes of WAL for every
8MB written, rather than ~8MB of WAL for every 8MB written. It
wouldn't allow some of the optimizations that the current unlogged
tables can get away with only because they WAL-log exactly nothing -
and selectively zeroing chunks of a large table might slow down
startup quite a bit - but it might still be useful to someone.

However, I think that the "logged table, unlogged index" idea is
probably the most promising thing to think about doing first. It's
easy to imagine all sorts of uses for that sort of thing even in cases
where people can't afford to have any data get zeroed, and it would
provide a convenient building block for something like the above if we
eventually wanted to go that way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2011-04-25 18:39:17 Re: offline consistency check and info on attributes
Previous Message Merlin Moncure 2011-04-25 18:34:22 Re: "stored procedures"