Re: Changeset Extraction v7.0 (was logical changeset generation)

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changeset Extraction v7.0 (was logical changeset generation)
Date: 2014-01-23 17:21:40
Message-ID: 20140123172140.GH7182@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-01-23 11:50:57 -0500, Robert Haas wrote:
> On Thu, Jan 23, 2014 at 7:05 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > I don't think shared buffers fsyncs are the apt comparison. It's more
> > something like UpdateControlFile(). Which PANICs.
> >
> > I really don't get why you fight PANICs in general that much. There are
> > some nasty PANICs in postgres which can happen in legitimate situations,
> > which should be made to fail more gracefully, but this surely isn't one
> > of them. We're doing rename(), unlink() and rmdir(). That's it.
> > We should concentrate on the ones that legitimately can happen, not the
> > ones created by an admin running a chmod -R 000 . ; rm -rf $PGDATA or
> > mount -o remount,ro /. We don't increase reliability by a bit adding
> > codepaths that will never get tested.
>
> Sorry, I don't buy it. Lots of people I know have stories that go
> like this "$HORRIBLE happened, and PostgreSQL kept on running, and it
> didn't even lose my data!", where $HORRIBLE may be variously that the
> disk filled up, that disk writes started failing with I/O errors, that
> somebody changed the permissions on the data directory inadvertently,
> that the entire data directory got removed, and so on.

Especially the "not loosing data" imo is because postgres is
conservative with continuing in situations it doesn't know anything
about. Most prominently the cluster wide restart after a segfault.

> I've been
> through some of those scenarios myself, and the care and effort that's
> been put into failure modes has saved my bacon more than a few times,
> too. We *do* increase reliability by worrying about what will happen
> even in code paths that very rarely get exercised.

A part of thinking about them *is* restricting what happens in those
cases by keeping the possible states to worry about to a minimum.

Just splapping on an ERROR instead of PANIC can make things much
worse. Not releasing space until a restart, without a chance to do
anything about it because we failed to properly release the in-memory
slot will just make the problem bigger because now the cleanup might
take a week (VACUUM FULLing the entire cluster?).

> I think it's completely unacceptable for the failure of routine
> filesystem operations to result in a PANIC. I grant you that we have
> some existing cases where that can happen (like UpdateControlFile),
> but that doesn't mean we should add more. Right this very minute
> there is massive bellyaching on a nearby thread caused by the fact
> that a full disk condition while writing WAL can PANIC the server,
> while on this thread at the very same time you're arguing that adding
> more ways for a full disk to cause PANICs won't inconvenience anyone.

A full disk won't cause any of the problems for the case we're
discussing, will it? We're just doing rename(), unlink(), rmdir() here,
all should succeed while the FS is full (afair rename() does on all
common FSs because inodes are kept separately).

> The other thread is right, and your argument here is wrong. We have
> been able to - and have taken the time to - fix comparable problems in
> other cases, and we should do the same thing here.

I don't think the WAL case is comparable at all. ENOSPC is something
expected that can happen during normal operation and doesn't include
malintended operator and is reasonably easy to test. unlink() or fsync()
randomly failing is not.
In fact, isn't the consequence out of that thread that we need a
significant amount of extra complexity to handle the case? We shouldn't
spend that effort for cases that don't deserve it because they are too
unlikely in practice.

And yes, there's not too many other places PANICing - because most can
rely on WAL handling those tricky cases for them...

> Second, I have
> encountered a few situations where customers had production servers
> that repeatedly PANICked due to some bug or other. If I've ever
> encountered angrier customers, I can't remember when. A PANIC is no
> big deal when it happens on your development box, but when it happens
> on a machine with 100 users connected to it, it's a big deal,
> especially if a single underlying cause makes it happen over and over
> again.

Sure. But blindly continuing and then, possibly quite a bit later,
loosing data, causing an outage that takes a long while to recover or
something isn't any better.

> I think we should be devoting our time to figuring how to improve
> this, not whether to improve it.

If you'd argue that creating a new slot should fail gracefull, ok, I can
relatively easily be convinced of that. But trying to handle failures in
the midst of deletion in cases that won't happen in reality is just
inviting trouble imo.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2014-01-23 17:24:33 Re: Case sensitive mode in windows build option
Previous Message Robert Haas 2014-01-23 16:50:57 Re: Changeset Extraction v7.0 (was logical changeset generation)