Re: Online enabling of checksums

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Online enabling of checksums
Date: 2018-08-01 17:34:55
Message-ID: e2987a58-12cf-e32a-833a-820eaf7f3d0d@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/01/2018 09:20 AM, Alvaro Herrera wrote:
>
>> my problem is that I think the "restart" approach is just using the
>> entirely wrong hammer to solve the problem at hand. At the very least
>> it's very problematic in respect to replicas, which need to know about
>> the setting too, and can have similar problems the restart on the
>> primary is supposed to prevent.
> If we define "restart" to mean taking all the servers down
> simultaneously, that can be planned.

People in mission critical environments do not "restart all servers".
They fail over to a secondary to do maintenance on a primary. When you
have a system where you literally lose thousands of dollars every minute
the database is down you can't do what you are proposing. When you have
a system that if the database is down for longer than X minutes, you
actually lose a whole day because all of the fabricators have to
revalidate before they begin work, you can't do that. Granted that is
not the majority (which you mention) but let's not forget them.

The one place where a restart does happen and will continue to happen
for around 5 (3 if you incorporate pg_logical and 9.6) more years is
upgrades. Although we have logical replication for upgrades now, we are
5 years away from the majority of users being on a version of PostgreSQL
that supports logical replication for upgrades. So, I can see an
argument for an incremental approach because people could enable
checksums as part of their upgrade restart.

> For users that cannot do that,
> that's too bad, they'll have to wait to the next release in order to
> enable checksums (assuming they fund the necessary development). But

I have to say, as a proponent of funded development for longer than most
I like to see this refreshing take on the fact that this all does take
money.

> there are many systems where it *is* possible to take everything down
> for five seconds, then back up. They can definitely take advantage of
> checksummed data.

This is a good point.

> Currently, the only way to enable checksums is to initdb and create a
> new copy of the data from a logical backup, which could take hours or
> even days if data is large, or use logical replication.

Originally, I was going to -1 how this is being implemented. I too wish
we had the "ALTER DATABASE ENABLE CHECKSUM" or equivalent without a
restart. However, being able to just restart is a huge step forward from
what we have now.

Lastly, I think Alvaro has a point with the incremental development and
I also think some others on this thread need to, "show me the patch"
instead of being armchair directors of development.

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
*** A fault and talent of mine is to tell it exactly how it is. ***
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
***** Unless otherwise stated, opinions are my own. *****

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2018-08-01 17:42:29 Re: Memory leak with CALL to Procedure with COMMIT.
Previous Message Kefan Yang 2018-08-01 17:05:35 RE: GSOC 2018 Project - A New Sorting Routine