From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | Daniel Gustafsson <daniel(at)yesql(dot)se>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Online checksums patch - once again |
Date: | 2020-10-05 14:46:17 |
Message-ID: | 8c713a9a-7616-0041-209f-73c52a82ca15@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 05/10/2020 17:25, Álvaro Herrera wrote:
> On 2020-Oct-05, Heikki Linnakangas wrote:
>
>> The code in sendFile() in basebackup.c seems suspicious in that regard. It
>> calls DataChecksumsNeedVerify() once before starting to read the file. Isn't
>> it possible for the checksums flag to change while it's reading the file and
>> sending it to the client? I hope there are CHECK_FOR_INTERRUPTS() calls
>> buried somewhere in the loop, because it could take minutes to send the
>> whole file.
>>
>> I would feel better if the state transition of the "checksums" flag could
>> only happen in a few safe places, or there were some other safeguards for
>> this. I think that's what Andres was trying to say earlier in the thread on
>> ProcSignalBarriers. I'm not sure what the interface to that should be. It
>> could be something like HOLD/RESUME_INTERRUPTS(), where normally all
>> procsignals are handled on CHECK_FOR_INTERRUPTS(), but you could "hold off"
>> some if needed. Or something else. Or maybe we can just use
>> HOLD/RESUME_INTERRUPTS() for this. It's more coarse-grained than necessary,
>> but probably doesn't matter in practice.
>
> I hope you're not suggesting that interrupts would be held for the whole
> transmission of a file, which you say could take minutes. If we do have
> an interrupt holdoff, then it has to be pretty short; users (and
> systemd) despair if service shutdown is delayed more than a few seconds.
I'm not suggesting that, sorry I wasn't clear. That would indeed be
horrible.
sendFile() needs a different solution, but all the other places where
DataChecksumsNeedWrite/Verify() is called need to be inspected to make
sure that they hold interrupts, or ensure some other way that an
interrupt doesn't change the local checksums flag between the
DataChecksumsNeedWrite/Verify() call and the read/write.
I think sendFile() needs to re-check the local checksums state before
each read. It also needs to ensure that an interrupt doesn't occur and
change the local checksums state between read and the
DataChecksumsNeedVerify() calls, but that's a very short period if you
call DataChecksumsNeedVerify() again for each block.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Mario Emmenlauer | 2020-10-05 14:49:27 | Re: dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0 |
Previous Message | Tom Lane | 2020-10-05 14:35:22 | Re: dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0 |