From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Reduce ProcArrayLock contention |
Date: | 2015-08-20 10:19:40 |
Message-ID: | 20150820101940.GA5070@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2015-08-20 15:38:36 +0530, Amit Kapila wrote:
> On Wed, Aug 19, 2015 at 9:09 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I spent some time today reviewing the commited patch. So far my only
> > major complaint is that I think the comments are only insufficiently
> > documenting the approach taken:
> > Stuff like avoiding ABA type problems by clearling the list entirely and
> > it being impossible that entries end up on the list too early absolutely
> > needs to be documented explicitly.
> >
>
> I think more comments can be added to explain such behaviour if it is
> not clear via looking at current code and comments.
It's not mentioned at all, so yes.
> I think you are right and here we need to use something like what is
> suggested below by you. Originally the code was similar to what you
> have written below, but it was using a different (new) variable to achieve
> what you have achieved with lwWaiting and to avoid the use of new
> variable the code has been refactored in current way. I think we should
> do this change (I can write a patch) unless Robert feels otherwise.
I think we can just rename lwWaiting to something more generic.
> > Consider what happens if such a follower enqueues in another
> > transaction. It is not, as far as I could find out, guaranteed on all
> > types of cpus that a third backend can actually see nextClearXidElem
> > as INVALID_PGPROCNO. That'd likely require SMT/HT cores and multiple
> > sockets. If the write to nextClearXidElem is entered into the local
> > store buffer (leader #1) a hyper-threaded process (leader #2) can
> > possibly see it (store forwarding) while another backend doesn't
> > yet.
> >
> > I think this is very unlikely to be an actual problem due to
> > independent barriers until enqueued again, but I don't want to rely
> > on it undocumentedly. It seems safer to replace
> > + wakeidx = pg_atomic_read_u32(&proc->nextClearXidElem);
> > + pg_atomic_write_u32(&proc->nextClearXidElem,
> INVALID_PGPROCNO);
> > with a pg_atomic_exchange_u32().
> >
>
> I didn't follow this point, if we ensure that follower can never return
> before leader wakes it up, then why it will be a problem to update
> nextClearXidElem like above.
Because it doesn't generally enforce that *other* backends have seen the
write as there's no memory barrier.
> > +/*
> > + * ProcArrayGroupClearXid -- group XID clearing
> > + *
> > + * When we cannot immediately acquire ProcArrayLock in exclusive mode at
> > + * commit time, add ourselves to a list of processes that need their XIDs
> > + * cleared. The first process to add itself to the list will acquire
> > + * ProcArrayLock in exclusive mode and perform
> ProcArrayEndTransactionInternal
> > + * on behalf of all group members. This avoids a great deal of context
> > + * switching when many processes are trying to commit at once, since the
> lock
> > + * only needs to be handed from the last share-locker to one process
> waiting
> > + * for the exclusive lock, rather than to each one in turn.
> > + */
> > +static void
> > +ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
> > +{
> >
> > This comment, in my opinion, is rather misleading. If you have a
> > workload that primarily is slowed down due to transaction commits, this
> > patch doesn't actually change the number of context switches very
> > much. Previously all backends enqueued in the lwlock and got woken up
> > one-by-one. Safe backends 'jumping' the queue while a lock has been
> > released but the woken up backend doesn't yet run, there'll be exactly
> > as many context switches as today.
> >
>
> I think this is debatable as in previous mechanism all the backends
> one-by-one clears their transaction id's (which is nothing but context
> switching) which lead to contention not only with read lockers
> but Write lockers as well.
Huh? You can benchmark it, there's barely any change in the number of
context switches.
I am *not* saying that there's no benefit to the patch, I am saying that
context switches are the wrong explanation. Reduced contention (due to
shorter lock holding times, fewer cacheline moves etc.) is the reason
this is beneficial.
> > I don't think it's a good idea to use the variable name in PROC_HDR and
> > PGPROC, it's confusing.
> What do you mean by this, are you not happy with variable name?
Yes. I think it's a bad idea to have the same variable name in PROC_HDR
and PGPROC.
struct PGPROC
{
...
/* Support for group XID clearing. */
volatile pg_atomic_uint32 nextClearXidElem;
...
}
typedef struct PROC_HDR
{
...
/* First pgproc waiting for group XID clear */
volatile pg_atomic_uint32 nextClearXidElem;
...
}
PROC_HDR's variable imo isn't well named.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2015-08-20 10:53:43 | Re: Supporting fallback RADIUS server(s) |
Previous Message | Amit Kapila | 2015-08-20 10:11:38 | Re: Reduce ProcArrayLock contention |