| From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> | 
|---|---|
| To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> | 
| Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> | 
| Subject: | Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes? | 
| Date: | 2022-09-26 15:03:53 | 
| Message-ID: | CALj2ACW5Cy2uC_Hu1V-aq5uJx+4i=UDabQr_WvQwOPsPQULEEA@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Sat, Sep 24, 2022 at 9:44 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> Although WriteFile() with a synchronous file handle and an explicit
> offset doesn't use the current file position, it appears that it still
> changes it.  :-(
>
> You'd think from the documentation[1] that that isn't the case, because it says:
>
> "Considerations for working with synchronous file handles:
>
>  * If lpOverlapped is NULL, the write operation starts at the current
> file position and WriteFile does not return until the operation is
> complete, and the system updates the file pointer before WriteFile
> returns.
>
>  * If lpOverlapped is not NULL, the write operation starts at the
> offset that is specified in the OVERLAPPED structure and WriteFile
> does not return until the write operation is complete. The system
> updates the OVERLAPPED Internal and InternalHigh fields before
> WriteFile returns."
>
> So it's explicitly saying the file pointer is updated in the first
> bullet point and not the second, but src/port/win32pwrite.c is doing
> the second thing.
The WriteFile() and pwrite() implementation in win32pwrite.c are
changing the file pointer on Windows, see the following and [4] for
more details.
# Running: pg_basebackup --no-sync -cfast -D
C:\cirrus\build\testrun\pg_basebackup\010_pg_basebackup\data\tmp_test_sV4r/backup2
--no-manifest --waldir
C:\cirrus\build\testrun\pg_basebackup\010_pg_basebackup\data\tmp_test_sV4r/xlog2
pg_basebackup: offset_before 0 and offset_after 16777216 aren't the same
Assertion failed: offset_before == offset_after, file
../src/bin/pg_basebackup/walmethods.c, line 254
Irrespective of what Windows does with file pointers in WriteFile(),
should we add lseek(SEEK_SET) in our own pwrite()'s implementation,
something like [5]? This is rather hackish without fully knowing what
Windows does internally in WriteFile(), but this does fix inherent
issues that our pwrite() callers (there are quite a number of places
that use pwrite() and presumes file pointer doesn't change on Windows)
may have on Windows. See the regression tests passing [6] with the fix
[5].
> Yet the following assertion added to Bharath's code
> fails[2]:
That was a very quick patch though, here's another version adding
offset checks and an assertion [3], see the assertion failures here
[4].
I also think that we need to discuss this issue separately.
Thoughts?
> [1] https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile#synchronization-and-file-position
> [2] https://api.cirrus-ci.com/v1/artifact/task/6201051266154496/testrun/build/testrun/pg_basebackup/010_pg_basebackup/log/regress_log_010_pg_basebackup
[3] - https://github.com/BRupireddy/postgres/tree/add_pwrite_and_offset_checks_in_walmethods_v2
[4] - https://api.cirrus-ci.com/v1/artifact/task/5294264635621376/testrun/build/testrun/pg_basebackup/010_pg_basebackup/log/regress_log_010_pg_basebackup
[5]
diff --git a/src/port/win32pwrite.c b/src/port/win32pwrite.c
index 7f2e62e8a7..542b548279 100644
--- a/src/port/win32pwrite.c
+++ b/src/port/win32pwrite.c
@@ -37,5 +37,16 @@ pwrite(int fd, const void *buf, size_t size, off_t offset)
                return -1;
        }
+       /*
+        * On Windows, it is found that WriteFile() changes the file
pointer and we
+        * want pwrite() to not change. Hence, we explicitly reset the
file pointer
+        * to beginning of the file.
+        */
+       if (lseek(fd, 0, SEEK_SET) != 0)
+       {
+               _dosmaperr(GetLastError());
+               return -1;
+       }
+
        return result;
 }
[6] - https://github.com/BRupireddy/postgres/tree/add_pwrite_and_offset_checks_in_walmethods_v3
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jonathan S. Katz | 2022-09-26 15:06:51 | PostgreSQL 15 GA release date | 
| Previous Message | Peter Eisentraut | 2022-09-26 14:55:22 | Re: Convert *GetDatum() and DatumGet*() macros to inline functions |