From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
Subject: | Re: Online enabling of checksums |
Date: | 2018-04-03 22:11:49 |
Message-ID: | 82296c92-7124-dc5b-4f00-053f302d7233@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 04/03/2018 02:05 PM, Magnus Hagander wrote:
> On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander <magnus(at)hagander(dot)net
> <mailto:magnus(at)hagander(dot)net>> wrote:
>
> On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>>
> wrote:
>
> On 03/31/2018 05:05 PM, Magnus Hagander wrote:
> > On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
> > <tomas(dot)vondra(at)2ndquadrant(dot)com
> <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>
> <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com
> <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>>> wrote:
> >
> > ...
> >
> > I do think just waiting for all running transactions to complete is
> > fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
> > does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
> > too, to some extent). So we have a precedent / working code we can copy.
> >
> >
> > Thinking again, I don't think it should be done as part of
> > BuildRelationList(). We should just do it once in the launcher before
> > starting, that'll be both easier and cleaner. Anything started after
> > that will have checksums on it, so we should be fine.
> >
> > PFA one that does this.
> >
>
> Seems fine to me. I'd however log waitforxid, not the oldest one. If
> you're a DBA and you want to make the checksumming to proceed,
> knowing
> the oldest running XID is useless for that. If we log
> waitforxid, it can
> be used to query pg_stat_activity and interrupt the sessions
> somehow.
>
>
> Yeah, makes sense. Updated.
>
>
>
> > > And if you try this with a temporary table (not
> hidden in transaction,
> > > so the bgworker can see it), the worker will fail
> with this:
> > >
> > > ERROR: cannot access temporary tables of other
> sessions
> > >
> > > But of course, this is just another way how to crash
> without updating
> > > the result for the launcher, so checksums may end up
> being enabled
> > > anyway.
> > >
> > >
> > > Yeah, there will be plenty of side-effect issues from that
> > > crash-with-wrong-status case. Fixing that will at least
> make things
> > > safer -- in that checksums won't be enabled when not put
> on all pages.
> > >
> >
> > Sure, the outcome with checksums enabled incorrectly is a
> consequence of
> > bogus status, and fixing that will prevent that. But that
> wasn't my main
> > point here - not articulated very clearly, though.
> >
> > The bigger question is how to handle temporary tables
> gracefully, so
> > that it does not terminate the bgworker like this at all.
> This might be
> > even bigger issue than dropped relations, considering that
> temporary
> > tables are pretty common part of applications (and it also
> includes
> > CREATE/DROP).
> >
> > For some clusters it might mean the online checksum
> enabling would
> > crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
> >
> > Unfortunately, try_relation_open() won't fix this, as the
> error comes
> > from ReadBufferExtended. And it's not a matter of simply
> creating a
> > ReadBuffer variant without that error check, because
> temporary tables
> > use local buffers.
> >
> > I wonder if we could just go and set the checksums anyway,
> ignoring the
> > local buffers. If the other session does some changes,
> it'll overwrite
> > our changes, this time with the correct checksums. But it
> seems pretty
> > dangerous (I mean, what if they're writing stuff while
> we're updating
> > the checksums? Considering the various short-cuts for
> temporary tables,
> > I suspect that would be a boon for race conditions.)
> >
> > Another option would be to do something similar to running
> transactions,
> > i.e. wait until all temporary tables (that we've seen at
> the beginning)
> > disappear. But we're starting to wait on more and more stuff.
> >
> > If we do this, we should clearly log which backends we're
> waiting for,
> > so that the admins can go and interrupt them manually.
> >
> >
> >
> > Yeah, waiting for all transactions at the beginning is pretty
> simple.
> >
> > Making the worker simply ignore temporary tables would also be
> easy.
> >
> > One of the bigger issues here is temporary tables are
> *session* scope
> > and not transaction, so we'd actually need the other session
> to finish,
> > not just the transaction.
> >
> > I guess what we could do is something like this:
> >
> > 1. Don't process temporary tables in the checksumworker, period.
> > Instead, build a list of any temporary tables that existed
> when the
> > worker started in this particular database (basically anything
> that we
> > got in our scan). Once we have processed the complete
> database, keep
> > re-scanning pg_class until those particular tables are gone
> (search by oid).
> >
> > That means that any temporary tables that are created *while*
> we are
> > processing a database are ignored, but they should already be
> receiving
> > checksums.
> >
> > It definitely leads to a potential issue with long running
> temp tables.
> > But as long as we look at the *actual tables* (by oid), we
> should be
> > able to handle long-running sessions once they have dropped
> their temp
> > tables.
> >
> > Does that sound workable to you?
> >
>
> Yes, that's pretty much what I meant by 'wait until all
> temporary tables
> disappear'. Again, we need to make it easy to determine which
> OIDs are
> we waiting for, which sessions may need DBA's attention.
>
> I don't think it makes sense to log OIDs of the temporary
> tables. There
> can be many of them, and in most cases the connection/session is
> managed
> by the application, so the only thing you can do is kill the
> connection.
>
>
> Yeah, agreed. I think it makes sense to show the *number* of temp
> tables. That's also a predictable amount of information -- logging
> all temp tables may as you say lead to an insane amount of data.
>
> PFA a patch that does this. I've also added some docs for it.
>
> And I also noticed pg_verify_checksums wasn't installed, so fixed
> that too.
>
>
> PFA a rebase on top of the just committed verify-checksums patch.
>
This seems OK in terms of handling errors in the worker and passing it
to the launcher. I haven't managed to do any crash testing today, but
code-wise it seems sane.
It however still fails to initialize the attempts field after allocating
the db entry in BuildDatabaseList, so if you try running with
-DRANDOMIZE_ALLOCATED_MEMORY it'll get initialized to values like this:
WARNING: attempts = -1684366952
WARNING: attempts = 1010514489
WARNING: attempts = -1145390664
WARNING: attempts = 1162101570
I guess those are not the droids we're looking for?
Likewise, I don't see where ChecksumHelperShmemStruct->abort gets
initialized. I think it only ever gets set in launcher_exit(), but that
does not seem sufficient. I suspect it's the reason for this behavior:
test=# select pg_enable_data_checksums(10, 10);
ERROR: database "template0" does not allow connections
HINT: Allow connections using ALTER DATABASE and try again.
test=# alter database template0 allow_connections = true;
ALTER DATABASE
test=# select pg_enable_data_checksums(10, 10);
ERROR: could not start checksumhelper: already running
test=# select pg_disable_data_checksums();
pg_disable_data_checksums
---------------------------
(1 row)
test=# select pg_enable_data_checksums(10, 10);
ERROR: could not start checksumhelper: has been cancelled
At which point the only thing you can do is restarting the cluster,
which seems somewhat unnecessary. But perhaps it's intentional?
Attached is a diff with a couple of minor comment tweaks, and correct
initialization of the attempts field.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
checksums-tweaks.diff | text/x-patch | 3.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-04-03 23:01:50 | Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key |
Previous Message | Thomas Munro | 2018-04-03 22:07:10 | Re: Optimize Arm64 crc32c implementation in Postgresql |