From: | Konstantin Belousov <kostikbel(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Maxim Sobolev <sobomax(at)freebsd(dot)org>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #14206: Switch to using POSIX semaphores on FreeBSD |
Date: | 2016-06-22 15:00:30 |
Message-ID: | 20160622150030.GT38613@kib.kiev.ua |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Wed, Jun 22, 2016 at 10:48:50AM -0400, Tom Lane wrote:
> Konstantin Belousov <kostikbel(at)gmail(dot)com> writes:
> > On Tue, Jun 21, 2016 at 04:36:02PM -0400, Tom Lane wrote:
> >> If that seems like a competitive alternative for you, it'd be nice to have
> >> a platform where we use unnamed POSIX semaphores by default. I'm a little
> >> worried about whether that code has suffered bit-rot, since it's been
> >> sitting there basically unused for so long.
>
> > On FreeBSD, there is no practical difference in the resource consumption
> > for named vs. unnamed semaphore. I mean that after sem_open(3) call, an
> > open file descriptor is not kept in the process fd table. The semaphore
> > is represented by the mmaped page, libc+kernel operate solely on the
> > page content and use umtx(2) to implement counted semaphore.
>
> Is there any kernel-side resource at all? The thing that concerns me
> about the POSIX APIs is that it's not very clear whether anything gets
> left behind if the database crashes. The Linux man page for sem_destroy
> says
>
> An unnamed semaphore should be destroyed with sem_destroy() before the
> memory in which it is located is deallocated. Failure to do this can
> result in resource leaks on some implementations.
>
> and while they don't say that their own implementation has such a problem,
> it's worrisome. We go to some lengths to ensure that we can recycle SysV
> semaphores after a crash, but there's no equivalent logic in the POSIX
> semaphore code, and I don't see how it would even be possible to identify
> leftover "unnamed" semaphores.
On FreeBSD, it is only a memory page which is mmaped into all
processes-consumers of the unnamed semaphore. Of course, if the process
is blocked on semaphore, there is some bookkeeping done in kernel so
that post would find all waiters. But it is lightweight and automatically
released on wakeup. In other words, there is nothing to worry about
WRT cleanup after kill of unnamed semaphore consumers. Same for named,
but there the file is left around.
>
> > That said, the problem with the SysV semaphores is that API allows
> > operations on arbitrary sets of the semaphores. Unless some unordinary
> > and complex measures are taken, implementation has to use global
> > internal lock to synchronize semop(2). This is what I noted in the
> > paper.
>
> It's certainly true that semop(2) is more complicated than we need.
> But in practice, we only call semop(2) when we need to sleep, or to
> awaken a sleeping process, so I'm not sure that performance of it
> matters a lot to us.
Issue is that the sleeps and wakeups on SysV semaphores do not scale,
at least on FreeBSD.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-06-22 15:12:06 | Re: pg_dump doesn't dump new objects created in schemas from extensions |
Previous Message | Stephen Frost | 2016-06-22 14:53:49 | Re: pg_dump doesn't dump new objects created in schemas from extensions |