| From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Mikael Kjellström <mikael(dot)kjellstrom(at)gmail(dot)com>, Pierre-Emmanuel André <pea(at)openbsd(dot)org> | 
| Subject: | Re: OpenBSD versus semaphores | 
| Date: | 2019-01-08 07:05:12 | 
| Message-ID: | CAEepm=2ndy5RSABeaf3L1hFhXoBSg09RvgudfTWfbn=DMUbJ3w@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Tue, Jan 8, 2019 at 7:14 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I've been toying with OpenBSD lately, and soon noticed a seriously
> annoying problem for running Postgres on it: by default, its limits
> for SysV semaphores are only SEMMNS=60, SEMMNI=10.  Not only does that
> greatly constrain the number of connections for a single installation,
> it means that our TAP tests fail because you can't start two postmasters
> concurrently (cf [1]).
>
> Raising the annoyance factor considerably, AFAICT the only way to
> increase these settings is to build your own custom kernel.
>
> So I looked around for an alternative, and found out that modern
> OpenBSD releases support named POSIX semaphores (though not unnamed
> ones, at least not shared unnamed ones).  What's more, it appears that
> in this implementation, named semaphores don't eat open file descriptors
> as they do on macOS, removing our major objection to using them.
>
> I don't have any OpenBSD installation on hardware that I'd take very
> seriously for performance testing, but some light testing with
> "pgbench -S" suggests that a build with PREFERRED_SEMAPHORES=NAMED_POSIX
> has just about the same performance as a build with SysV semaphores.
>
> This all leads to the thought that maybe we should be selecting
> PREFERRED_SEMAPHORES=NAMED_POSIX on OpenBSD.  At the very least,
> our docs ought to recommend it as a credible alternative for
> people who don't want to get into building custom kernels.
>
> I've checked that this works back to OpenBSD 6.0, and scanning
> their man pages suggests that the feature appeared in 5.5.
> 5.5 isn't that old (2014) so possibly people are still running
> older versions, but we could easily put in version-specific
> default logic similar to what's in src/template/darwin.
>
> Thoughts?
No OpenBSD here, but I was curious enough to peek at their
implementation.  Like others, they create a tiny file under /tmp for
each one, mmap() and close the fd straight away.  Apparently don't
support shared sem_init() yet (EPERM).  So your plan seems good to me.
CC'ing Pierre-Emmanuel (OpenBSD PostgreSQL port maintainer) in case he
is interested.
Wild speculation:  I wouldn't be surprised if POSIX named semas
perform better than SysV semas on a large enough system, since they'll
live on different pages.  At a glance, their sys_semget apparently
allocates arrays of struct sem without padding and I think they
probably get about 4 to a cacheline; see our experience with an 8
socket box leading to commit 2d306759 where we added our own padding.
-- 
Thomas Munro
http://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kyotaro HORIGUCHI | 2019-01-08 07:26:38 | Re: Improve selectivity estimate for range queries | 
| Previous Message | Amit Langote | 2019-01-08 06:30:10 | Re: speeding up planning with partitions |