From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Subject: | Re: Streaming I/O, vectored I/O (WIP) |
Date: | 2023-12-09 09:23:00 |
Message-ID: | 4533e76e-9519-4715-acd0-d4fa552619b0@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09/12/2023 02:41, Thomas Munro wrote:
> On Sat, Dec 9, 2023 at 7:25 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>> On 2023-11-30 13:01:46 +1300, Thomas Munro wrote:
>>> On Thu, Nov 30, 2023 at 12:16 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>>> Maybe we should bite the bullet and always retry short writes in
>>>> FileWriteV(). Is that what you meant by "handling them"?
>>>> If the total size is expensive to calculate, how about passing it as an
>>>> extra argument? Presumably it is cheap for the callers to calculate at
>>>> the same time that they build the iovec array?
>
>>> There is another problem with pushing it down to fd.c, though.
>>> Suppose you try to write 8192 bytes, and the kernel says "you wrote
>>> 4096 bytes" so your loop goes around again with the second half the
>>> data and now the kernel says "-1, ENOSPC". What are you going to do?
>>> fd.c doesn't raise errors for I/O failure, it fails with -1 and errno,
>>> so you'd either have to return -1, ENOSPC (converting short writes
>>> into actual errors, a lie because you did write some data), or return
>>> 4096 (and possibly also set errno = ENOSPC as we have always done).
>>> So you can't really handle this problem at this level, can you?
>>> Unless you decide that fd.c should get into the business of raising
>>> errors for I/O failures, which would be a bit of a departure.
>>>
>>> That's why I did the retry higher up in md.c.
>>
>> I think that's the right call. I think for AIO we can't do retry handling
>> purely in fd.c, or at least it'd be quite awkward. It doesn't seem like it'd
>> buy us that much in md.c anyway, we still need to handle the cross segment
>> case and such, from what I can tell?
>
> Heikki, what do you think about this: we could go with the v3 fd.c
> and md.c patches, but move adjust_iovec_for_partial_transfer() into
> src/common/file_utils.c, so that at least that slightly annoying part
> of the job is available for re-use by future code that faces the same
> problem?
Ok, works for me.
--
Heikki Linnakangas
Neon (https://neon.tech)
From | Date | Subject | |
---|---|---|---|
Next Message | Hannu Krosing | 2023-12-09 11:32:22 | Why are wal_keep_size, max_slot_wal_keep_size requiring server restart? |
Previous Message | Junwang Zhao | 2023-12-09 08:39:11 | Re: Make COPY format extendable: Extract COPY TO format implementations |