Re: Enable data checksums by default

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Enable data checksums by default
Date: 2024-08-22 11:10:15
Message-ID: CAKZiRmzdH14fD-GxMvH_WmH+P_A3UR0SZsL6opHBy=ZwFNpppg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 22, 2024 at 8:11 AM Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>
> On 15.08.24 08:38, Peter Eisentraut wrote:
> > On 08.08.24 19:42, Robert Haas wrote:
> >>> I'm thinking pg_upgrade could have a mode where it adds the
> >>> checksum during the upgrade as it copies the files (essentially a subset
> >>> of pg_checksums). I think that would be useful for that middle tier of
> >>> users who just want a good default experience.
> >> That would be very nice.
> >
> > Here is a demo patch for that. It turned out to be quite simple.
> >
> > I wrote above about a separate mode for that (like
> > --copy-and-make-adjustments), but it was just as easy to stick it into
> > the existing --copy mode.
> >
> > It would be useful to check what the performance overhead of this is
> > versus a copy that does not have to make adjustments. I expect it's
> > very little.
> >
> > A drawback is that as written this does not work on Windows, because
> > Windows uses a different code path in copyFile(). I don't know the
> > reasons for that. But it would need to be figured out.
>
> Here is an updated patch for this. I simplified the logic a bit and
> also handle the case where the read() reads less than a round number of
> blocks. I did some performance testing. The overhead of computing the
> checksums versus a straight --copy without checksum adjustments appears
> to be around 5% wall clock time, which seems ok to me. I also looked
> around the documentation to see if there is anything to update, but
> didn't find anything.
>
> I think if we can work out what to do on Windows, this could be a useful
> little feature for facilitating $subject.

My take:
1. I wonder if we should or should not by default calculate/enable the
checksums when doing pg_upgrade --copy from cluster with
checksums=off. Maybe we should error on that like we are doing now.
There might be still people want to have them off, but they would use
the proposed-new-defaults-of-initdb with checksums on blindly (so this
should be opt-in via some switch like with let's say
--copy-and-enable-checksums; so the user is in full control).
2. WIN32's copyFile() could then stay as it is, and then that new
--copy-and-enable-checksums on WIN32 would have to fallback to classic
loop.

-J.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-08-22 11:15:00 Re: Redundant Result node
Previous Message Ashutosh Bapat 2024-08-22 11:05:51 Re: Partial aggregates pushdown