From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se> |
Subject: | Re: Online enabling of checksums |
Date: | 2018-02-25 21:48:38 |
Message-ID: | b7c8b142-0d2e-f262-6e77-9ef744b06fb9@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 02/24/2018 10:45 PM, Magnus Hagander wrote:
> On Sat, Feb 24, 2018 at 1:34 AM, Robert Haas <robertmhaas(at)gmail(dot)com
> <mailto:robertmhaas(at)gmail(dot)com>> wrote:
>
> On Thu, Feb 22, 2018 at 3:28 PM, Magnus Hagander
> <magnus(at)hagander(dot)net <mailto:magnus(at)hagander(dot)net>> wrote:
> > I would prefer that yes. But having to re-read 9TB is still significantly
> > better than not being able to turn on checksums at all (state today). And
> > adding a catalog column for it will carry the cost of the migration
> > *forever*, both for clusters that never have checksums and those that had it
> > from the beginning.
> >
> > Accepting that the process will start over (but only read, not re-write, the
> > blocks that have already been processed) in case of a crash does
> > significantly simplify the process, and reduce the long-term cost of it in
> > the form of entries in the catalogs. Since this is a on-time operation (or
> > for many people, a zero-time operation), paying that cost that one time is
> > probably better than paying a much smaller cost but constantly.
>
> That's not totally illogical, but to be honest I'm kinda surprised
> that you're approaching it that way. I would have thought that
> relchecksums and datchecksums columns would have been a sort of
> automatic design choice for this feature. The thing to keep in mind
> is that nobody's going to notice the overhead of adding those columns
> in practice, but someone will surely notice the pain that comes from
> having to restart the whole operation. You're talking about trading
> an effectively invisible overhead for a very noticeable operational
> problem.
>
>
> Is it really that invisible? Given how much we argue over adding
> single counters to the stats system, I'm not sure it's quite that
> low.
I'm a bit unsure where would the flags be stored - I initially assumed
pg_database/pg_class, but now I see mentions of the stats system.
But I wonder why should this be stored in a catalog at all? The info is
only needed by the bgworker(s), so they could easily flush the current
status to a file every now and then and fsync it. Then after restart, if
you find a valid file, use it to resume from the last OK position. If
not, start from scratch.
FWIW this is pretty much what the stats collector does.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2018-02-26 02:17:31 | Re: Online enabling of checksums |
Previous Message | Tomas Vondra | 2018-02-25 21:38:58 | Re: Online enabling of checksums |