Re: WAL format

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL format
Date: 2009-12-09 12:34:15
Message-ID: 407d949e0912090434q4339ac5fm9bf91fd4939a8de9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 7, 2009 at 8:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Heikki Linnakangas wrote:
>>> - at the end of WAL segment, when there's not enough space to write the
>>> next WAL record, always write an XLOG SWITCH record to fill the rest of
>>> the segment.
>
>> What happens if a record is larger than a WAL segment?  For example,
>> what if I insert a 16 MB+ datum into a varlena field?
>
> That case doesn't pose a problem --- the datum would be toasted into
> individual tuples that are certainly no larger than a page.  However
> we do have cases where a WAL record can get arbitrarily large; in
> particular a commit record with many subtransactions and/or many
> disk files to delete.  These cases do get exercised in the field
> too --- I can recall at least one related bug report.

Sounds like a reason to make the format simpler...

If we raise the maximum segment size is there a point where we would
be in a reasonable range to impose maximum sizes for these lists?
32MB? 64MB? It's not like there isn't a limit now -- we'll just throw
an out of memory error when replaying the recovery if it doesn't fit
in memory.

What if we push the work of handling these lists up to the recovery
manager instead of xlog.c? So commit records would send a record
saying "when xid nnnn commits the following subtransactions commit as
well" and it could send multiple such records. The recovery manager is
responsible when it sees such records to remember the list somewhere
and append the new values if it's already seen the list, possibly even
spilling to disk and reload it when it sees the corresponding commit.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zdenek Kotala 2009-12-09 13:32:35 Re: [patch] pg_ctl init extension
Previous Message Robert Haas 2009-12-09 11:57:10 Re: EXPLAIN BUFFERS