Quick Links

Re: Better handling of archive_command problems

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Daniel Farina <daniel(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Better handling of archive_command problems
Date:	2013-05-16 18:16:23
Message-ID:	CA+TgmoazC-pHEJF4Hnkns5pB+A06E_RsaQuWsH1ghVb5X3ngtQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, May 15, 2013 at 6:40 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Wed, May 15, 2013 at 3:46 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> One possible objection to this line of attack is that, IIUC, waits to
>> acquire a LWLock are non-interruptible. If someone tells PostgreSQL
>> to wait for some period of time before performing each WAL write,
>> other backends that grab the WALWriteLock will not respond to query
>> cancels during that time.
>
> I don't see any reasonable way to make LWLocks care about interrupts
> (using all 3 possible underlying semaphore implementations, no less).

It couldn't be done across the board, but that doesn't mean that
certain cases couldn't get special treatment.

> As it says within LWLockAcquire:
>
> /*
> * Lock out cancel/die interrupts until we exit the code section protected
> * by the LWLock. This ensures that interrupts will not interfere with
> * manipulations of data structures in shared memory.
> */
> HOLD_INTERRUPTS();
>
> We've been pretty judicious about placing CHECK_FOR_INTERRUPTS() calls
> in the right places, but it's still quite possible to see the server
> take multiple seconds - perhaps even as many as 10 - to respond to an
> interrupt (by psql SIGINT). Now, I didn't have enough of an interest
> at the times I noticed this to figure out exactly why that may have
> been or to somehow characterize it, but I don't accept that it's a
> violation of some Postgres precept that this setting could result in
> interrupts taking multiple seconds, and maybe even as many as 10. I'd
> go so far as to let the user make the throttling sleep take as long as
> they like, even though this admittedly would sort of break such a
> precept.

Well, I think it IS a Postgres precept that interrupts should get a
timely response. You don't have to agree, but I think that's
important.

> There is a setting called zero_damaged_pages, and enabling it causes
> data loss. I've seen cases where it was enabled within postgresql.conf
> for years.

That is both true and bad, but it is not a reason to do more bad things.

>> Now despite all that, I can see this being useful enough that Heroku
>> might want to insert a very small patch into their version of
>> PostgreSQL to do it this way, and just live with the downsides. But
>> anything that can propagate non-interruptible waits across the entire
>> system does not sound to me like a feature that is sufficiently
>> polished that we want to expose it to users less sophisticated than
>> Heroku (i.e. nearly all of them). If we do this, I think we ought to
>> find a way to make the waits interruptible, and to insert them in
>> places where they really don't interfere with read-only backends.
>
> It would be nice to be able to be sure that CLogControlLock could not
> be held for multiple seconds as a result of this. However, I don't see
> any reasons to let the perfect be the enemy of the good, or at least
> the better. Just how likely is it that the scenario you describe will
> affect reads in the real world? In any case, this is a problem in its
> own right.

That's true, but you're proposing to add a knob which would make it
much easier for users to notice the bad behavior that already exists,
and to prolong it for unbounded periods of time even when the system
is not under load.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Better handling of archive_command problems at 2013-05-15 22:40:06 from Peter Geoghegan

Responses

Re: Better handling of archive_command problems at 2013-05-16 18:42:41 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2013-05-16 18:42:41	Re: Better handling of archive_command problems
Previous Message	Shaun Thomas	2013-05-16 17:44:32	Re: Allowing parallel pg_restore from pipe