Re: logical replication: could not create file "state.tmp": File exists

From: Grigory Smolkin <g(dot)smolkin(at)postgrespro(dot)ru>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: logical replication: could not create file "state.tmp": File exists
Date: 2019-12-02 12:27:48
Message-ID: 6ea8cf93-ca74-ceb0-18bc-4fc5f8f55df1@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


On 12/2/19 7:35 AM, Michael Paquier wrote:
> On Sat, Nov 30, 2019 at 03:09:39PM +0300, Grigory Smolkin wrote:
>> I`ve digged a bit into this problem, and it`s turned out that in
>> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
>> O_EXCL' flags, which makes this routine as not very reentrant.
> What did you see as I/O problem before facing the actual error
> reported here? Was it just ENOSPC, a fsync failure, or just a failure
> in closing the fd? The first pattern is mostly what I guess happened,
> still a fsync failure would not trigger a PANIC here (actually we
> really should do that!), but I am raising a different thread about
> that issue.

Hello!

I didn`t see the very first error that left behind the temp file.
I`ve requested it just now, but it will take some time to get it (there
are several terabytes of text log).
But I assume that it was out of space error, which, by the look of the
code, should produce ERROR and leave temp file behind, just as it
happened in aforementioned  case.

>
>> Since an exclusive lock is taken before temp file creation, I think it
>> should be safe to replace O_EXCL with O_TRUNC.
>> Script to reproduce and patch are attached.
> Agreed. I prefer the O_TRUNC option because that's less code churn.
> Also, as it can still be useful to have a look at the temporary state
> file after a crash or a failure, doing unlink() in the error code
> paths is no good option IMO.

I`m sorry, but it was an production system, so, as I understand it,
stale temp file was hastily deleted without long considerations.

Thank you for your interest in this topic.

>
> Have others thoughts or objections to share?
> --
> Michael

--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-12-02 12:41:21 BUG #16144: Segmentation fault on dict_int extension
Previous Message Petr Fedorov 2019-12-02 11:08:43 Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch