Re: pg_archivecleanup bug

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_archivecleanup bug
Date: 2013-12-05 20:39:31
Message-ID: CA+Tgmoa_0h3f01VqzZN2k=jY6S9aHJmXM9o4bGw-JgB18fXizw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 5, 2013 at 3:06 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> An EDB customer reported a problem with pg_archivecleanup which I
> have looked into and found a likely cause. It is, in any event, a
> bug which I think should be fixed. It has to do with our use of
> the readdir() function:
>
> http://pubs.opengroup.org/onlinepubs/7908799/xsh/readdir_r.html
>
> These are the relevant bits:
>
> | Applications wishing to check for error situations should set
> | errno to 0 before calling readdir(). If errno is set to non-zero
> | on return, an error occurred.
>
> | Upon successful completion, readdir() returns a pointer to an
> | object of type struct dirent. When an error is encountered, a
> | null pointer is returned and errno is set to indicate the error.
> | When the end of the directory is encountered, a null pointer is
> | returned and errno is not changed.
>
> Here is our current usage:
>
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=contrib/pg_archivecleanup/pg_archivecleanup.c;h=8f77998de12f95f41bb95c3e05a14de6cdf18047;hb=7800229b36d0444cf2c61f5c5895108ee5e8ee2a#l110
>
> So an error in scanning the directory will not be reported; the
> cleanup will quietly terminate the WAL deletions without processing
> the remainder of the directory. Attached is the simplest fix,
> which would report the error, stop looking for WAL files, and
> continue with other clean-ups. I'm not sure we should keep the fix
> that simple. We could set a flag so that we would exit with a
> non-zero code, or we could try a new directory scan as long as the
> last scan found and deleted at least one WAL file. Perhaps we want
> to back-patch the simple fix and do something fancier for 9.4?

A directory that you can't read sounds like a pretty bad thing. I'd
be inclined to print an error message and exit forthwith.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-12-05 20:41:44 Re: Why we are going to have to go DirectIO
Previous Message Robert Haas 2013-12-05 20:37:55 Re: Proof of concept: standalone backend with full FE/BE protocol