RE: Disable WAL logging to speed up data loading

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Stephen Frost' <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Disable WAL logging to speed up data loading
Date: 2020-11-19 02:07:54
Message-ID: OSBPR01MB4888B34B81A6E0DD46B5D063EDE00@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello

In the past discussion of wal_level=none,
there were some ideas to trace the change of wal_level,
in other words, *stronger mechanism* to check wal_level.
I agree with the idea to have a new monitoring item
and would like to implement those kind of, or one of those ideas for the next patch.
But I'm not sure where is the right place to write the information.
Was the best place the control file ?
On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> Checking the WAL level certainly seems critical to anything that's reading the
> WAL. We certainly do this already when running as a
> replica:
>
> ereport(WARNING,
> (errmsg("WAL was generated with wal_level=minimal, data may be
> missing"),
> errhint("This happens if you temporarily set wal_level=minimal
> without taking a new base backup.")));
>
> There's definitely a question about if a WARNING there is really sufficient or
> not, considering that you could end up with 'logged'
> tables on the replica that are missing data, but I'm not sure that inventing a
> new, independent, mechanism for checking WAL level changes makes sense.
>
> While pgbackrest backups of a primary wouldn't be impacted, it does support
> backing up from a replica (as do other backup systems, of
> course) and if data that's supposed to be on the replica isn't there because
> someone restarted PG with wal_level=minimal and then flipped it back to
> replica and got the replica to move past that part of the WAL by turning off hot
> standby, replaying, and then turning it back on, then the backup is going to
> also be missing that data.
>
> Perhaps that's where we need to have a stronger mechanism though- that is,
> if we hit the above WARNING, ever, set a flag somewhere that tools can check
> and which we could also check and throw further warnings about. In other
> words, every time this replica is started, we could check for this flag and
> throw the same warning above about how data may be missing, and we could
> have pg_basebackup flat out refuse to back up from a replica where this flag
> has been set (maybe with some kind of override mechanism... maybe not).
> After all, that's where the real issue here is, isn't it?
The first idea is warning that means replica could miss some data.
This is to notify some tools that taking a backup from replica is dangerous.
This can be applied to a change from replica to minimal(or none) and useful
because having the wal_level lowered to minimal does not mean
unclean shutdown or unexpected stoppage but it means a replica might get corrupted.

On Tuesday, Nov 3, 2020 12:28 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I'm not exactly sure what that would look like, but suppose we had a feature
> where every time wal_level drops below replica, a counter gets incremented by
> 1, and that counter is saved in the control file. Or maybe when wal_level drops
> below minimal to none. Or maybe there are two counters. Anyway, the idea is
> that if you have a snapshot of the cluster at one time and a snapshot at
> another time, you can see whether anything scary has happened in the middle
> without needing all of the WAL in between.
The second idea is incremental counter that indicates drop of wal_level
from replica to minimal (or like from minimal to none).
Its purpose is to compare the wal_level changes between snapshots.
When any monitoring tools detect any difference of the counter,
we can predict something happened immediately without checking WAL in between.

> > This would also be something that should be exposed as monitoring
> > points (which it could be if it's in pg_control). That is, I can
> > imagine a *lot* of installations that would definitely want an alert
> > to fire if the cluster has ever been started up in a wal_level=none or
> > wal_level=minimal, at least up until the point where somebody has run
> > a new full backup.
>
> I agree with the general idea that it'd be good for anything we do here to be
> made available for monitoring tools to be able to see.
The last one is notification of permanent damage from initial cluster parameter,
although this may be similar to the second one.
This is an indication that shows some part of data are not recovered,
if the cluster was initialized and loaded some data during wal_level=none.

Exposing the monitoring item for tools sounds reasonable for me of course.
Was the control file the best place for the information to implement those kind of ideas ?
What I'm concerned about is that those three items are not necessarily meaningful for all users.
Also, it seems better the size of the control file should be saved
because the size is limited to less than 512 bytes for atomic writes, which is written in a comment.
Here, I didn't say that I'll take all of those ideas into the patch
but this perspective doesn't change whenever we change the content of control file, I would say.
Another aspect unique to the last idea is that
the value will be no use after taking a new full backup.
Thus, I'm not sure if writing that kind of information in the control file is right or not.

Before my implementation,
I would like to get an agreement from the community on this point.

Best,
Takamichi Osumi

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-11-19 02:16:05 Re: ResourceOwner refactoring
Previous Message Andres Freund 2020-11-19 01:52:12 Re: CREATE AGGREGATE array_cat