From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Changeset Extraction v7.0 (was logical changeset generation) |
Date: | 2014-01-22 15:48:58 |
Message-ID: | 20140122154858.GK21170@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2014-01-22 10:14:27 -0500, Robert Haas wrote:
> On Wed, Jan 22, 2014 at 9:48 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2014-01-18 08:35:47 -0500, Robert Haas wrote:
> >> > I am not sure I understand that point. We can either update the
> >> > in-memory bit before performing the on-disk operations or
> >> > afterwards. Either way, there's a way to be inconsistent if the disk
> >> > operation fails somewhere inbetween (it might fail but still have
> >> > deleted the file/directory!). The normal way to handle that in other
> >> > places is PANICing when we don't know so we recover from the on-disk
> >> > state.
> >> > I really don't see the problem here? Code doesn't get more robust by
> >> > doing s/PANIC/ERROR/, rather the contrary. It takes extra smarts to only
> >> > ERROR, often that's not warranted.
> >>
> >> People get cranky when the database PANICs because of a filesystem
> >> failure. We should avoid that, especially when it's trivial to do so.
> >> The update to shared memory should be done second and should be set
> >> up to be no-fail.
> >
> > I don't see how that would help. If we fail during unlink/rmdir, we
> > don't really know at which point we failed.
>
> This doesn't make sense to me. unlink/rmdir are atomic operations.
Yes, individual operations should be, but you cannot be sure whether a
rename()/unlink() will survive a crash until the directory is
fsync()ed. So, what is one going to do if the unlink suceeded, but the
fsync didn't?
Deletion currently works like:
if (rename(path, tmppath) != 0)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not rename \"%s\" to \"%s\": %m",
path, tmppath)));
/* make sure no partial state is visible after a crash */
fsync_fname(tmppath, false);
fsync_fname("pg_replslot", true);
if (!rmtree(tmppath, true))
{
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not remove directory \"%s\": %m",
tmppath)));
}
If we fail between the rename() and the fsync_fname() we don't really
know which state we are in. We'd also have to add code to handle
incomplete slot directories, which currently only exists for startup, to
other places.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Christian Kruse | 2014-01-22 16:04:42 | Re: Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire |
Previous Message | Robert Haas | 2014-01-22 15:48:10 | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |