Re: Add recovery to pg_control and remove backup_label

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add recovery to pg_control and remove backup_label
Date: 2023-11-21 20:00:18
Message-ID: 20231121200018.ifkhrclmum3gq2pt@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-11-21 14:48:59 -0400, David Steele wrote:
> > I'd not call 7.06->4.77 or 6.76->4.77 "virtually free".
>
> OK, but how does that look with compression

With compression it's obviously somewhat different - but that part is done in
parallel, potentially on a different machine with client side compression,
whereas I think right now the checksumming is single-threaded, on the server
side.

With parallel server side compression, it's still 20% slower with the default
checksumming than none. With client side it's 15%.

> -- to a remote location?

I think this one unfortunately makes checksums a bigger issue, not a smaller
one. The network interaction piece is single-threaded, adding another
significant use of CPU onto the same thread means that you are hit harder by
using substantial amount of CPU for checksumming in the same thread.

Once you go beyond the small instances, you have plenty network bandwidth in
cloud environments. We top out well before the network on bigger instances.

> Uncompressed backup to local storage doesn't seem very realistic. With gzip
> compression we measure SHA1 checksums at about 5% of total CPU.

IMO using gzip is basically infeasible for non-toy sized databases today. I
think we're using our users a disservice by defaulting to it in a bunch of
places. Even if another default exposes them to difficulty due to potentially
using a different compiled binary with fewer supported compression methods -
that's gona be very rare in practice.

> I can't understate how valuable checksums are in finding corruption,
> especially in long-lived backups.

I agree! But I think we need faster checksum algorithms or a faster
implementation of the existing ones. And probably default to something faster
once we have it.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2023-11-21 20:08:35 Re: Add recovery to pg_control and remove backup_label
Previous Message David Steele 2023-11-21 18:48:59 Re: Add recovery to pg_control and remove backup_label