Re: Online enabling of checksums

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Subject: Re: Online enabling of checksums
Date: 2018-02-24 21:56:57
Message-ID: CABUevEzMuHn6Hc2GeCrjcefxXTnwdMb0Fg7zPkMCH-EArA5suA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 24, 2018 at 10:49 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2018-02-24 22:45:09 +0100, Magnus Hagander wrote:
> > Is it really that invisible? Given how much we argue over adding single
> > counters to the stats system, I'm not sure it's quite that low.
>
> That's appears to be entirely unrelated. The stats stuff is expensive
> because we currently have to essentialy write out the stats for *all*
> tables in a database, once a counter is updated. And those counters are
> obviously constantly updated. Thus the overhead of adding one column is
> essentially multiplied by the number of tables in the system. Whereas
> here it's a single column that can be updated on a per-row basis, which
> is barely ever going to be written to.
>
> Am I missing something?
>

It's probably at least partially unrelated, you are right. I may have
misread our reluctance to add more values there as a general reluctancy to
add more values to central columns.

> > We did consider doing it at a per-table basis as well. But this is also
> an
> > overhead that has to be paid forever, whereas the risk of having to read
> > the database files more than once (because it'd only have to read them on
> > the second pass, not write anything) is a one-off operation. And for all
> > those that have initialized with checksums in the first place don't have
> to
> > pay any overhead at all in the current design.
>
> Why does it have to be paid forever?
>

The size of the pg_class row would be there forever. Granted, it's not that
big an overhead given that there are already plenty of columns there. But
the point being you can never remove that column, and it will be there for
users who never even considered running without checksums. It's certainly
not a large overhead, but it's also not zero.

> I very strongly doubg it's a "very noticeable operational problem". People
> > don't restart their databases very often... Let's say it takes 2-3 weeks
> to
> > complete a run in a fairly large database. How many such large databases
> > actually restart that frequently? I'm not sure I know of any. And the
> only
> > effect of it is you have to start the process over (but read-only for the
> > part you have already done). It's certainly not ideal, but I don't agree
> > it's in any form a "very noticeable problem".
>
> I definitely know large databases that fail over more frequently than
> that.
>

I would argue that they have bigger issues than enabling checksums... By
far.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-02-24 22:01:59 Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Previous Message Andres Freund 2018-02-24 21:49:57 Re: Online enabling of checksums