Re: Hard limit on WAL space used (because PANIC sucks)

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)heroku(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Hard limit on WAL space used (because PANIC sucks)
Date: 2014-01-21 23:43:29
Message-ID: 20140121234329.GB32729@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-01-21 18:24:39 -0500, Tom Lane wrote:
> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> > On Tue, Jan 21, 2014 at 9:35 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> My preference would be that we simply start failing writes with ERRORs
> >> rather than PANICs. I'm not real sure ATM why this has to be a PANIC
> >> condition. Probably the cause is that it's being done inside a critical
> >> section, but could we move that?
>
> > My understanding is that if it runs out of buffer space while in an
> > XLogInsert, it will be holding one or more buffer content locks
> > exclusively, and unless it can complete the xlog (or scrounge up the info
> > to return that buffer to its previous state), it can never release that
> > lock. There might be other paths were it could get by with an ERROR, but
> > if no one can write xlog anymore, all of those paths must quickly converge
> > to the one that cannot simply ERROR.
>
> Well, the point is we'd have to somehow push detection of the problem
> to a point before the critical section that does the buffer changes
> and WAL insertion.

Well, I think that's already hard for the heapam.c stuff, but doing that
in xact.c seems fracking hairy. We can't simply stop in the middle of a
commit and not continue, that'd often grind the system to a halt
preventing cleanup. Additionally the size of the inserted record for
commits is essentially unbounded, which makes it an especially fun case.

> The first idea that comes to mind is (1) estimate the XLOG space needed
> (an overestimate is fine here); (2) just before entering the critical
> section, call some function to "reserve" that space, such that we always
> have at least sum(outstanding reservations) available future WAL space;
> (3) release our reservation as part of the actual XLogInsert call.

I think that's not necessarily enough. In a COW filesystem like btrfs or
ZFS you really cannot give much guarantees about writes suceeding or
failing, even if we were able to create (and zero) a new segment.

Even if you disregard that, we'd need to keep up with lots of concurrent
reservations, looking a fair bit into the future. E.g. during a "smart"
shutdown in a workload with lots of subtransactions trying to reserve
space might make the situation actually worse because we might end up
trying to reserve the combined size of records.

> The problem here is that the "reserve" function would presumably need an
> exclusive lock, and would be about as much of a hot spot as XLogInsert
> itself is. Plus we'd be paying a lot of extra cycles to solve a corner
> case problem that, with all due respect, comes up pretty darn seldom.
> So probably we need a better idea than that.

Yea, I don't think anything really safe is going to work without
signifcant penalties.

> Maybe we could get some mileage out of the fact that very approximate
> techniques would be good enough. For instance, I doubt anyone would bleat
> if the system insisted on having 10MB or even 100MB of future WAL space
> always available. But I'm not sure exactly how to make use of that
> flexibility.

If we'd be more aggressive with preallocating WAL files and doing so in
the WAL writer, we could stop accepting writes in some common codepaths
(e.g. nodeModifyTable.c) as soon as preallocating failed but continue to
accept writes in other locations (e.g. TRUNCATE, DROP TABLE). That'd
still fail if you write a *very* large commit record using up all the
reserve though...

I personally think this isn't worth complicating the code for.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Kara 2014-01-21 23:49:34 Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Previous Message Peter Geoghegan 2014-01-21 23:38:48 Re: Hard limit on WAL space used (because PANIC sucks)