From: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Markus Wanner <markus(at)bluegap(dot)ch> |
Subject: | Re: Proposal for CSN based snapshots |
Date: | 2014-07-10 21:25:33 |
Message-ID: | 20140710212531.GW6390@eldon.alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Heikki Linnakangas wrote:
> Attached is a new patch. It now keeps the current pg_clog unchanged,
> but adds a new pg_csnlog besides it. pg_csnlog is more similar to
> pg_subtrans than pg_clog: it's not WAL-logged, is reset at startup,
> and segments older than GlobalXmin can be truncated.
>
> This addresses the disk space consumption, and simplifies pg_upgrade.
>
> There are no other significant changes in this new version, so it's
> still very much WIP. But please take a look!
Thanks. I've been looking at this. It needs a rebase; I had to apply
to commit 89cf2d52030 (Wed Jul 2 13:11:05 2014 -0400) because of
conflicts with the commit after that; it applies with some fuzz. I read
the discussion in this thread and the README, to try to understand how
this is supposed to work. It looks pretty good.
One thing not quite clear to me is the now-static RecentXmin,
GetRecentXmin(), AdvanceGlobalRecentXmin, and the like. I found the
whole thing pretty confusing. I noticed that ShmemVariableCache->recentXmin
is read without a lock in GetSnapshotData(), but exclusive ProcArrayLock
is taken to write to it in GetRecentXmin(). I think we can write it
without a lock also, since we assume that storing an xid is atomic.
With that change, I think you can make the acquisition of ProcArray Lock
to walk the pgproc list use shared mode, not exclusive.
Another point is that RecentXmin is no longer used directly, except in
TransactionIdIsActive(). But that routine is only used for an Assert()
now, so I think it's fine to just have GetRecentXmin() return the value
and not set RecentXmin as a side effect; my point is that ISTM
RecentXmin can be removed altogether, which makes that business simpler.
As far as GetOldestXmin() is concerned, that routine now ignores its
arguments. Is that okay? I imagine it's just a natural consequence of
how things work now. [ ... reads some more code ... ] Oh, now it's
only used to determine a freeze limit. I think you should just remove
the arguments and make that whole thing simpler. I was going to suggest
that AdvanceRecentGlobalXmin() could receive a possibly-NULL Relation
argument to pass down to GetOldestSnapshotGuts() which can make use of
it for a tigther limit, but since OldestXmin is only for freezing, maybe
there is no point in this extra complication.
Regarding the pendingGlobalXmin stuff, I didn't find any documentation.
I think you just need to write a very long comment on top of
AdvanceRecentGlobalXmin() to explain what it's doing. After going over
the code half a dozen times I think I understand what's idea, and it
seems sound.
Not sure about the advancexmin_counter thing. Maybe in some cases
sessions will only run 9 transactions or less and then finish, in which
case it will only get advanced at checkpoint time, which would be way
too seldom. Maybe it would work to place the counter in shared memory:
acquire(spinlock);
if (ShmemVariableCache->counter++ >= some_number)
{
SI don't understand this:hmemVariableCache->counter = 0;
do_update = true;
}
release(spinlock);
if (do_update)
AdvanceRecentGlobalXmin();
(Maybe we can forgo the spinlock altogether? If the value gets out of
hand once in a while, it shouldn't really matter) Perhaps the "some
number" should be scaled to some fraction of MaxBackends, but not less
than X (32?) and no more than Y (128?). Or maybe just use constant 10,
as the current code does. But it'd be total number of transactions, not
transaction in the current backend.
I think it'd be cleaner to have a distinct typedef to use when
XLogRecPtr is used as a snapshot representation. Right now there is no
clarity on whether we're interested in the xlog position itself or it's
a snapshot value.
HeapTupleSatisfiesVacuum[X]() and various callers needs update to their
comments: when OldestXmin is mentioned, it should be OldestSnapshot.
I noticed that SubTransGetTopmostTransaction() is now only called from
predicate.c, and it's pretty constrained in what it wants. I'm not
implying that we want to do anything in your patch about it, other than
perhaps add a note to it that we may want to examine it later for
possible changes.
I haven't gone over the transam.c, clog.c changes yet.
I attach a couple of minor tweaks to the README, mostly for clarity (but
also update clog -> csnlog in a couple of places).
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
readme.patch | text/x-diff | 2.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2014-07-10 21:29:34 | Re: Minmax indexes |
Previous Message | Andres Freund | 2014-07-10 21:11:19 | Re: Pg_upgrade and toast tables bug discovered |