From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction |
Date: | 2020-03-28 18:39:04 |
Message-ID: | 20200328183904.GI20103@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Mar 28, 2020 at 01:13:54PM -0400, Tom Lane wrote:
> The buildfarm just showed up another instability in the test cases
> we added:
Yea, as you said, this is an issue with the *testcase*. The function behavior
didn't change, we just weren't previously exercising it.
> select (w).size = :segsize as ok
> from (select pg_ls_waldir() w) ss where length((w).name) = 24 limit 1;
> - ok
> -----
> - t
> -(1 row)
> -
> +ERROR: could not stat file "pg_wal/000000010000000000000078": No such file or directory
> select count(*) >= 0 as ok from pg_ls_archive_statusdir();
> ok
> ----
>
> It's pretty obvious what happened here: concurrent activity renamed or
> removed the WAL segment between when we saw it in the directory and
> when we tried to stat() it.
>
> This seems like it would be just as much of a hazard for field usage
> as it is for regression testing,
That's clearly true for pg_ls_waldir(), which call pg_ls_dir_files, and
includes some metadata columns.
> so I propose that we fix these directory-scanning functions to silently
> ignore ENOENT failures from stat(). Are there any for which we should not do
> that?
I think it's reasonable to ignore transient ENOENT for tmpdir, logdir, and
probably archive_statusdir. That doesn't currently affect pg_ls_dir(), which
lists file but not metadata for an arbitrary dir, so doesn't call stat().
Note that dangling links in the other functions currently cause (wrong [0])
error. I guess it should be documented if broken link is will be ignored due
to ENOENT.
Maybe we should lstat() the file to determine if it's a dangling link; if
lstat() fails, then skip it. Currently, we use stat(), which shows metdata of
a link's *target*. Maybe we'd change that.
Note that I have a patch which generalizes pg_ls_dir_files and makes
pg_ls_dir() a simple wrapper, so if that's pursued, they would behave the same
unless I add another flag to do otherwise (but behaving the same has its
merits). It already uses lstat() to show links to dirs as isdir=no, which was
needed to avoid recursing into links-to-dirs in the new helper function
pg_ls_dir_recurse(). https://commitfest.postgresql.org/26/2377/
[0] Which you fixed in 085b6b667 and I previously fixed at:
https://www.postgresql.org/message-id/attachment/106478/v2-0001-BUG-in-errmsg.patch
|$ sudo ln -s foo /var/log/postgresql/bar
|ts=# SELECT * FROM pg_ls_logdir() ORDER BY 3;
|ERROR: could not stat directory "/var/log/postgresql": No such file or directory
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-03-28 18:43:23 | Re: pgsql: Add kqueue(2) support to the WaitEventSet API. |
Previous Message | Andres Freund | 2020-03-28 18:34:34 | Re: pgbench - refactor init functions with buffers |