From: | Noah Misch <noah(at)2ndQuadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: sinval synchronization considered harmful |
Date: | 2011-07-27 03:35:38 |
Message-ID: | 20110727033537.GB18910@tornado.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jul 26, 2011 at 06:04:16PM -0400, Tom Lane wrote:
> Noah Misch <noah(at)2ndQuadrant(dot)com> writes:
> > On Tue, Jul 26, 2011 at 05:05:15PM -0400, Tom Lane wrote:
> >> Dirty cache line, maybe not, but what if the assembly code commands the
> >> CPU to load those variables into CPU registers before doing the
> >> comparison? If they're loaded with maxMsgNum coming in last (or at
> >> least after resetState), I think you can have the problem without any
> >> assumptions about cache line behavior at all. You just need the process
> >> to lose the CPU at the right time.
>
> > True. If the compiler places the resetState load first, you could hit the
> > anomaly by "merely" setting a breakpoint on the next instruction, waiting for
> > exactly MSGNUMWRAPAROUND messages to enqueue, and letting the backend continue.
> > I think, though, we should either plug that _and_ the cache incoherency case or
> > worry about neither.
>
> How do you figure that? The poor-assembly-code-order risk is both a lot
> easier to fix and a lot higher probability. Admittedly, it's still way
> way down there, but you only need a precisely-timed sleep, not a
> precisely-timed sleep *and* a cache line that somehow remained stale.
I think both probabilities are too low to usefully distinguish. An sinval
wraparound takes a long time even in a deliberate test setup: almost 30 hours @
10k messages/sec. To get a backend to sleep that long, you'll probably need
something like SIGSTOP or a debugger attach. The sleep has to fall within the
space of no more than a few instructions. Then, you'd need to release the
process at the exact moment for it to observe wrapped equality. In other words,
you get one split-millisecond opportunity every 30 hours of process sleep time.
If your backends don't have multi-hour sleeps, it can't ever happen.
Even so, all the better if we settle on an approach that has neither hazard.
--
Noah Misch http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Petro Meier | 2011-07-27 06:51:22 | PQescapeByteaConn - returns wrong string for PG9.1 Beta3 |
Previous Message | Robert Haas | 2011-07-27 01:57:10 | Re: sinval synchronization considered harmful |