From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WAL format |
Date: | 2009-12-09 12:34:15 |
Message-ID: | 407d949e0912090434q4339ac5fm9bf91fd4939a8de9@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Dec 7, 2009 at 8:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Heikki Linnakangas wrote:
>>> - at the end of WAL segment, when there's not enough space to write the
>>> next WAL record, always write an XLOG SWITCH record to fill the rest of
>>> the segment.
>
>> What happens if a record is larger than a WAL segment? For example,
>> what if I insert a 16 MB+ datum into a varlena field?
>
> That case doesn't pose a problem --- the datum would be toasted into
> individual tuples that are certainly no larger than a page. However
> we do have cases where a WAL record can get arbitrarily large; in
> particular a commit record with many subtransactions and/or many
> disk files to delete. These cases do get exercised in the field
> too --- I can recall at least one related bug report.
Sounds like a reason to make the format simpler...
If we raise the maximum segment size is there a point where we would
be in a reasonable range to impose maximum sizes for these lists?
32MB? 64MB? It's not like there isn't a limit now -- we'll just throw
an out of memory error when replaying the recovery if it doesn't fit
in memory.
What if we push the work of handling these lists up to the recovery
manager instead of xlog.c? So commit records would send a record
saying "when xid nnnn commits the following subtransactions commit as
well" and it could send multiple such records. The recovery manager is
responsible when it sees such records to remember the list somewhere
and append the new values if it's already seen the list, possibly even
spilling to disk and reload it when it sees the corresponding commit.
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Zdenek Kotala | 2009-12-09 13:32:35 | Re: [patch] pg_ctl init extension |
Previous Message | Robert Haas | 2009-12-09 11:57:10 | Re: EXPLAIN BUFFERS |