Re: Proposed WAL changes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposed WAL changes
Date: 2001-03-08 17:03:33
Message-ID: 26035.984071013@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM> writes:
>> And how well will that approach work if the last checkpoint record
>> got written near the start of a log segment file, and then the
>> checkpointer discarded all your prior log segments because "you don't
>> need those anymore"? If the checkpoint record gets corrupted,
>> you have no readable log at all.

> The question - why should we have it? It is assumed that data files
> are flushed before checkpoint appears in log. If this assumtion
> is wrong due to *bogus* fsync/disk/whatever why should we increase
> disk space requirements which will affect *good* systems too?
> What will we buy with extra logs? Just some data we can't
> guarantee consistency anyway?
> It seems that you want guarantee more than me, Tom -:)

No, but I want a system that's not brittle. You seem to be content to
design a system that is reliable as long as the WAL log is OK but loses
the entire database unrecoverably as soon as one bit goes bad in the
log. I'd like a slightly softer failure mode. WAL logs *will* go bad
(even without system crashes; what of unrecoverable disk read errors?)
and we ought to be able to deal with that with some degree of grace.
Yes, we lost our guarantee of consistency. That doesn't mean we should
not do the best we can with what we've got left.

> BTW, in some my tests size of on-line logs was ~ 200Mb with default
> checkpoint interval. So, it's worth to care about on-line logs size.

Okay, but to me that suggests we need a smarter log management strategy,
not a management strategy that throws away data we might wish we still
had (for manual analysis if nothing else). Perhaps the checkpoint
creation rule should be "every M seconds *or* every N megabytes of log,
whichever comes first". It'd be fairly easy to signal the postmaster to
start up a new checkpoint process when XLogWrite rolls over to a new log
segment, if the last checkpoint was further back than some number of
segments. Comments?

> Please convince me that NEXTXID is necessary.
> Why add anything that is not useful?

I'm not convinced that it's not necessary. In particular, consider the
case where we are trying to recover from a crash using an on-line
checkpoint as our last readable WAL entry. In the pre-NEXTXID code,
this checkpoint would contain the current XID counter and an
advanced-beyond-current OID counter. I think both of those numbers
should be advanced beyond current, so that there's some safety margin
against reusing XIDs/OIDs that were allocated by now-lost XLOG entries.
The OID code is doing this right, but the XID code wasn't.

Again, it's a question of brittleness. Yes, as long as everything
operates as designed and the WAL log never drops a bit, we don't need
it. But I want a safety margin for when things aren't perfect.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2001-03-08 17:18:01 Re: Performance monitor
Previous Message Bruce Momjian 2001-03-08 16:45:16 Re: WAL & SHM principles