From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Paul Guo <paulguo(at)gmail(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Paul Guo <guopa(at)vmware(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Two patches to speed up pg_rewind. |
Date: | 2021-08-17 07:47:44 |
Message-ID: | YRtpoEkRU6Xl/kIC@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Aug 05, 2021 at 06:18:03PM +0800, Paul Guo wrote:
> I modified the copy_file_range() patch using the below logic:
>
> If the first call of copy_file_range() fails with errno EXDEV or
> ENOTSUP, pg_rewind
> would not use copy_file_range() in rest code, and if copy_file_range() fails we
> fallback to use the previous read()+write() code logic for the file.
I have looked at 0001, and I don't really like it. One argument
against this approach is that if pg_rewind fails in the middle of its
operation then we would have done a set of fsync() for nothing, with
the data folder still unusable. I would be curious to see some
numbers to see how much it matters with many physical files (say cases
with thousands of small relations?).
+/* Define PG_FLUSH_DATA_WORKS if we have an implementation for pg_flush_data */
+#if defined(HAVE_SYNC_FILE_RANGE)
+#define PG_FLUSH_DATA_WORKS 1
+#elif !defined(WIN32) && defined(MS_ASYNC)
+#define PG_FLUSH_DATA_WORKS 1
+#elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
+#define PG_FLUSH_DATA_WORKS 1
This is wrong for the code frontend on platforms that may finish by
using MS_ASYNC, no? There is no such implementation in file_utils.c
but there is one in fd.c.
+ fsync_fname("global/pg_control", false);
+ fsync_fname("backup_label", false);
+ if (access("recovery.conf", F_OK) == 0)
+ fsync_fname("recovery.conf", false);
+ if (access("postgresql.auto.conf", F_OK) == 0)
+ fsync_fname("postgresql.auto.conf", false);
This list is wrong on various aspects, no? This would miss custom
configuration files, or included files.
- if (showprogress)
- pg_log_info("syncing target data directory");
- sync_target_dir();
-
/* Also update the standby configuration, if requested. */
if (writerecoveryconf && !dry_run)
WriteRecoveryConfig(conn, datadir_target,
GenerateRecoveryConfig(conn, NULL));
+ if (showprogress)
+ pg_log_info("syncing target data directory");
+ perform_sync(filemap);
Why inverting the order here?
+ * Pre Linux 5.3 does not allow cross-fs copy_file_range() call
+ * (return EXDEV). Some fs do not support copy_file_range() (return
+ * ENOTSUP). Here we explicitly disable copy_file_range() for the
+ * two scenarios. For other failures we still allow subsequent
+ * copy_file_range() try.
+ */
+ if (errno == ENOTSUP || errno == EXDEV)
+ copy_file_range_support = false;
Are you sure that it is right to always cancel the support of
copy_file_range() after it does not work once? Couldn't it be
possible that things work properly depending on the tablespace being
worked on by pg_rewind?
Having the facility for copy_file_range() in pg_rewind is not nice at
the end, and we are going to need a run-time check to fallback
dynamically to an equivalent implementation on errno={EXDEV,ENOTSUP}.
Hmm. What about locating all that in file_utils.c instead, with a
brand new routine name (pg_copy_file_range would be one choice)? We
still need the ./configure check, except that the conditions to use
the fallback implementation is in this routine, aka fallback on EXDEV,
ENOTSUP or !HAVE_COPY_FILE_RANGE. The backend may need to solve this
problem at some point, but logging and fd handling will likely just
locate that in fd.c, so having one routine for the purpose of all
frontends looks like a step in the right direction.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2021-08-17 08:00:06 | Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o |
Previous Message | Dilip Kumar | 2021-08-17 07:37:45 | Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o |