From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Disable WAL logging to speed up data loading |
Date: | 2020-11-02 18:01:59 |
Message-ID: | 20201102180159.GS16415@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greetings,
* Magnus Hagander (magnus(at)hagander(dot)net) wrote:
> On Mon, Nov 2, 2020 at 4:28 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Thu, Oct 29, 2020 at 4:00 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> > > Yes. What I meant was such a safe guard needs to be implemented.
> > >
> > > This may mean that if we want to recover the database from that backup,
> > > we need to specify the recovery target so that the archive recovery stops
> > > just before the WAL record indicating wal_level change.
> >
> > Yeah, I think we need these kinds of safeguards, for sure.
> >
> > I'm also concerned about the way that this proposed feature interacts
> > with incremental backup capabilities that already exist in tools like
> > pgBackRest, EDB's BART, pg_probackup, and future things we might want
> > to introduce into core, along the lines of what I have previously
> > proposed. Now, I think pgBackRest uses only timestamps and checksums,
> > so it probably doesn't care, but some of the other solutions rely on
> > WAL-scanning to gather a list of changed blocks. I guess there's no
> > reason that they can't notice the wal_level being changed and do the
> > right thing; they should probably have that kind of capability
> > already. Still, it strikes me that it might be useful if we had a
> > stronger mechanism.
Checking the WAL level certainly seems critical to anything that's
reading the WAL. We certainly do this already when running as a
replica:
ereport(WARNING,
(errmsg("WAL was generated with wal_level=minimal, data may be missing"),
errhint("This happens if you temporarily set wal_level=minimal without taking a new base backup.")));
There's definitely a question about if a WARNING there is really
sufficient or not, considering that you could end up with 'logged'
tables on the replica that are missing data, but I'm not sure that
inventing a new, independent, mechanism for checking WAL level changes
makes sense.
While pgbackrest backups of a primary wouldn't be impacted, it does
support backing up from a replica (as do other backup systems, of
course) and if data that's supposed to be on the replica isn't there
because someone restarted PG with wal_level=minimal and then flipped it
back to replica and got the replica to move past that part of the WAL by
turning off hot standby, replaying, and then turning it back on, then
the backup is going to also be missing that data.
Perhaps that's where we need to have a stronger mechanism though- that
is, if we hit the above WARNING, ever, set a flag somewhere that tools
can check and which we could also check and throw further warnings
about. In other words, every time this replica is started, we could
check for this flag and throw the same warning above about how data may
be missing, and we could have pg_basebackup flat out refuse to back up
from a replica where this flag has been set (maybe with some kind of
override mechanism... maybe not). After all, that's where the real
issue here is, isn't it?
> > I'm not exactly sure what that would look like, but suppose we had a
> > feature where every time wal_level drops below replica, a counter gets
> > incremented by 1, and that counter is saved in the control file. Or
> > maybe when wal_level drops below minimal to none. Or maybe there are
> > two counters. Anyway, the idea is that if you have a snapshot of the
> > cluster at one time and a snapshot at another time, you can see
> > whether anything scary has happened in the middle without needing all
> > of the WAL in between.
> >
> > Maybe this is off-topic for this thread or not really needed, but I'm
> > not sure. I don't think wal_level=none is a bad idea intrinsically,
> > but I think it would be easy to implement it poorly and end up harming
> > a lot of users. I have no problem with giving people a way to do
> > dangerous things, but we should do our best to let people know how
> > much danger they've incurred.
I'm not sure that wal_level=none is really the right way to address this
use-case. We already have unlogged tables and that's pretty clean and
meets the "we want to load data fast without having to pay for WAL" use
case. The argument here seems to be that to take advantage of unlogged
tables requires the tools using PG to know how to issue a 'CREATE
UNLOGGED TABLE' command instead of a 'CREATE TABLE' command. That
doesn't seem like a huge leap, but we could make it easier by just
adding a 'table_log_default' or such GUC that could be set on the data
loading role to have all tables created by it be unlogged.
Looking back at the thread, it seems that perhaps the one other area
where this isn't ideal is when the user ultimately wants the data to be
*considered* logged even when it wasn't (ie: change from wal_level none,
or minimal, back to replica). On a quick look, it seems that we're
missing a trick there for UNLOGGED -> LOGGED changes since, while I
don't think we'll WAL the table during that if wal_level is set to
minimal, I do think we'll still end up pointlessly making a copy of all
of the data when we don't really need to..? Fixing that would be useful
in its own right though.
> I definitely think this is something that should be thought out and
> included in a patch like this, so it's definitely on-topic for this
> thread.
Considering it's an existing problem we have, I'm not entirely convinced
that it's on this patch to fix it beyond making sure to document the
impact. That said, mucking with WAL level seems like a really big
hammer and it'd be nice if we could support this use case with more
granularity, as I try to outline above.
> Having the ability to turn things off can certainly be very useful.
> Having the risk of having done so without realizing the damage caused
> is a *big* foot-gun, and we need to do our best to protect against it.
I don't know that we really do enough in this regard, today, and maybe
that's something we should discuss changing, but that's not really a new
concern that this patch is introducing.
> This is not entirely unlike the idea that we've discussed before of
> having basically a "tainted" flag in pg_control if the system has ever
> been started up in say fsync=off, just to make sure that we have a
> record of it. This wouldn't be the same flag of course, but it's a
> similar problem, where even temporarily starting the cluster up with a
> certain set of flags can do permanent damage which is not necessarily
> fixed by changing it back and restarting.
While I think I understand the concern, I'm not sure that it entirely
applies to the fsync=off case unless the system isn't cleanly shut down
too, something that is usually caught and detected through other means.
Having the WAL level be lowered to minimal is something that can cause a
replica to end up getting corrupted even without an unclean restart or
similar happening.
> This would also be something that should be exposed as monitoring
> points (which it could be if it's in pg_control). That is, I can
> imagine a *lot* of installations that would definitely want an alert
> to fire if the cluster has ever been started up in a wal_level=none or
> wal_level=minimal, at least up until the point where somebody has run
> a new full backup.
I agree with the general idea that it'd be good for anything we do here
to be made available for monitoring tools to be able to see.
Thanks,
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2020-11-02 18:03:29 | Re: vacuum -vs reltuples on insert only index |
Previous Message | Tom Lane | 2020-11-02 17:58:00 | Re: -O switch |