| From: | Craig Ringer <craig(at)2ndquadrant(dot)com> | 
|---|---|
| To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> | 
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: OK, so culicidae is *still* broken | 
| Date: | 2017-04-24 15:14:40 | 
| Message-ID: | CAMsr+YH9qEJ7U8Fmv+K9BbFxd34Trd64MoaKT8iKMQYobd_nHg@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 24 April 2017 at 16:55, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Another thing I have tried is to just start the server by setting
> RandomizedBaseAddress="TRUE".  I have tried about 15-20 times but
> could not reproduce the problem related to shared memory attach.  We
> have tried the same on one of my colleagues (Ashutosh Sharma) machine
> as well, there we could see that error once or twice out of many tries
> but couldn't get it consistently.  I think if the chances of this
> problem to occur are so less, then probably the suggestion made by Tom
> to retry if we get collision doesn't sound bad.
It's pretty uncommon, and honestly, we might well be best off just
trying again if we lose the lottery.
Most of what I read last time I looked into this essentially assumed
that you'd "fix" your code by reinventing far pointers[1], like the
good old Win16 days. Assume all addresses in shmem are relative to the
shmem base, and offset them when accessing/storing them. Fun and
efficient for everyone! That seems to be what Boost recommends[2].
Given that Pg doesn't make much effort to differentiate between
pointers to shmem and local memory, and has no pointer transformations
between shared and local pointers, adopting that would be a
horrifyingly intrusive change as well as incredibly tedious to
implement. We'd probably land up using size_t or ptrdiff_t for shmem
pointers and some kind of macro that was a noop on !windows. For once
I'd be thoroughly in agreement with Tom's complaints about
Windows-droppings.
Other people who've faced and worked around this[3] have come up with
solutions that look way scarier than just retrying if we lose the
random numbers game.
BTW, some Windows users face issues with large contiguous
allocations/mappings even without the process sharing side[4] due to
memory fragmentation created by ASLR, though this may only be a
concern for 32-bit executables. The relevant option /LARGEADDRESSAWARE
is enabled by default for 64-bit builds.
We might want to /DELAYLOAD [5] DLLs where possible to improve our
chances of winning the dice roll, but if we're going to support
retrying at all we don't need to care too much.
I looked at image binding (prelinking), but it's disabled if ASLR is in use.
In the long run we'll probably be forced toward threading or far pointers.
[1] https://en.wikipedia.org/wiki/Far_pointer,
https://en.wikipedia.org/wiki/Intel_Memory_Model#Pointer%5Fsizes
[3] http://stackoverflow.com/a/36145019/398670
[4] https://github.com/golang/go/issues/2323
[5] On 24 April 2017 at 16:55, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:   > Another thing I have tried is to just start the server by
setting  > RandomizedBaseAddress="TRUE".  I have tried about 15-20
times but  > could not reproduce the problem related to shared memory
attach.  We  > have tried the same on one of my colleagues (Ashutosh
Sharma) machine  > as well, there we could see that error once or
twice out of many tries  > but couldn't get it consistently.  I think
if the chances of this  > problem to occur are so less, then probably
the suggestion made by Tom  > to retry if we get collision doesn't
sound bad.   It's pretty uncommon, and honestly, we might well be best
off just trying again if we lose the lottery.   Most of what I read
last time I looked into this essentially assumed that you'd "fix" your
code by reinventing far pointers[1], like the good old Win16 days.
Assume all addresses in shmem are relative to the shmem base, and
offset them when accessing/storing them. Fun and efficient for
everyone! That seems to be what Boost recommends[2].   Given that Pg
doesn't make much effort to differentiate between pointers to shmem
and local memory, and has no pointer transformations between shared
and local pointers, adopting that would be a horrifyingly intrusive
change as well as incredibly tedious to implement. We'd probably land
up using size_t or ptrdiff_t for shmem pointers and some kind of macro
that was a noop on !windows. For once I'd be thoroughly in agreement
with Tom's complaints about Windows-droppings.   Other people who've
faced and worked around this[3] have come up with solutions that look
way scarier than just retrying if we lose the random numbers game.
BTW, some Windows users face issues with large contiguous
allocations/mappings even without the process sharing side[4] due to
memory fragmentation created by ASLR, though this may only be a
concern for 32-bit executables. The relevant option /LARGEADDRESSAWARE
is enabled by default for 64-bit builds.  We should /DELAYLOAD as many
DDLs as possible to improve our chances.   [1]
https://en.wikipedia.org/wiki/Far_pointer,
https://en.wikipedia.org/wiki/Intel_Memory_Model#Pointer%5Fsizes   [2]
http://www.boost.org/doc/libs/1_64_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.mapped_region.mapped_region_address_mapping
  [3] http://stackoverflow.com/a/36145019/398670   [4]
https://github.com/golang/go/issues/2323   --  Craig Ringer
       http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7
Support, Training & Services
-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Craig Ringer | 2017-04-24 15:19:48 | Re: A note about debugging TAP failures | 
| Previous Message | Surafel Temesgen | 2017-04-24 15:09:18 | DELETE and UPDATE with LIMIT and ORDER BY |