Re: Hard limit on WAL space used (because PANIC sucks)

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)heroku(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Hard limit on WAL space used (because PANIC sucks)
Date: 2014-01-22 00:17:36
Message-ID: 20140122001736.GC32729@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-01-21 18:59:13 -0500, Tom Lane wrote:
> Another thing to think about is whether we couldn't put a hard limit on
> WAL record size somehow. Multi-megabyte WAL records are an abuse of the
> design anyway, when you get right down to it. So for example maybe we
> could split up commit records, with most of the bulky information dumped
> into separate records that appear before the "real commit". This would
> complicate replay --- in particular, if we abort the transaction after
> writing a few such records, how does the replayer realize that it can
> forget about those records? But that sounds probably surmountable.

Which cases of essentially unbounded record sizes do we currently have?
The only ones that I remember right now are commit and abort records
(including when wrapped in a prepared xact).
Containing a) the list of committing/aborting subtransactions and b) the
list of files to drop c) cache invalidations.
Hm. There's also xl_standby_locks, but that'd be easily splittable.

I think removing the list of subtransactions from commit records would
essentially require not truncating pg_subtrans after a restart
anymore. If we'd truncate it in concord with pg_clog, we'd only need to
log the subxids which haven't been explicitly assigned. Unfortunately
pg_subtrans will be bigger than pg_clog, making such a scheme likely to
be painful.

We could relatively easily split of logging the dropped files from
commit records and log them in groups afterwards, we already have
several races allowing to leak files.

We could do something similar for the cache invalidations, but that
seems likely to get rather ugly, as we'd need to hold procarraylock till
the next record is read, or until a shutdown or end-of-recovery record
is read. If one of the latter is found before the corresponding
invalidations, we'd need to invalidate the entire syscache.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-01-22 00:18:36 Re: Hard limit on WAL space used (because PANIC sucks)
Previous Message Harold Giménez 2014-01-22 00:08:20 Re: proposal: hide application_name from other users