Re:Re:Re: BUG #15187: When use huge page, there may be a lot of hanged connections with status startup or authentication

From: chenhj <chjischj(at)163(dot)com>
To: chenhj <chjischj(at)163(dot)com>
Cc: "Andres Freund" <andres(at)anarazel(dot)de>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re:Re:Re: BUG #15187: When use huge page, there may be a lot of hanged connections with status startup or authentication
Date: 2018-05-19 14:24:17
Message-ID: 3f0cf5ff.777d.16378c971d3.Coremail.chjischj@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


At 2018-05-15 06:16:39, "chenhj" <chjischj(at)163(dot)com> wrote:

At 2018-05-07 02:57:12, "Andres Freund" <andres(at)anarazel(dot)de> wrote:
>On 2018-05-06 23:45:17 +0800, chenhj wrote:
>> >>
>> >>Chen, have you disabled transparent hugepages and zone reclaim?
>> >>
>> >>Greetings,
>> >>
>> >>Andres Freund>>c) Depend on huge page >huge_page=on, happen(no matter transparent_hugepage is [always] or [never]) >huge_page=off, not happen
>> >
>> >When disable transparent hugepages ,this problem also occurs.
>> >Aboud zone reclaim,I will see it later.
>> >What I doubt is that this problem does not occurs at PostgreSQL 9.6.2 (I tested 10.2 and 9.6.2 on the same machine)
>> >d) Depend on PostgreSQL Version
>> >PostgreSQL 10.2 happen
>> >PostgreSQL 9.6 not happen
>> >Chen Huajun
>> The problem occurs whether vm.zone_reclaim_mode is set to 0 or 1.
>>
>> In addition, what needs to be corrected is that even huge_pages=off is problematic.
>>
>> Huge_pages = on SQL execution is a very slow , and with hangd connections in startup and auth state.
>>
>
>You'd probably need to provide a few perf profiles to get further
>insight.
>
>Greetings,
>

>Andres Freund

According to test, this question is related to commit "ecb0d20a9d2e09b7112d3b192047f711f9ff7e59", which changed from Using SysV semaphores to Using POSIX semaphores on Linux.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ecb0d20a9d2e09b7112d3b192047f711f9ff7e59

Use unnamed POSIX semaphores, if available, on Linux and FreeBSD.

We've had support for using unnamed POSIX semaphores instead of System V
semaphores for quite some time, but it was not used by default on any
platform. Since many systems have rather small limits on the number of
SysV semaphores allowed, it seems desirable to switch to POSIX semaphores
where they're available and don't create performance or kernel resource
problems. Experimentation by me shows that unnamed POSIX semaphores
are at least as good as SysV semaphores on Linux, and we previously had
a report from Maksym Sobolyev that FreeBSD is significantly worse with
SysV semaphores than POSIX ones. So adjust those two platforms to use
unnamed POSIX semaphores, if configure can find the necessary library
functions. If this goes well, we may switch other platforms as well,
but it would be advisable to test them individually first.

It's not currently contemplated that we'd encourage users to select
a semaphore API for themselves, but anyone who wants to experiment
can add PREFERRED_SEMAPHORES=UNNAMED_POSIX (or NAMED_POSIX, or SYSV)
to their configure command line to do so.

I also tweaked configure to report which API it's selected, mainly
so that we can tell that from buildfarm reports.

I did not touch the user documentation's discussion about semaphores;
that will need some adjustment once the dust settles.

Discussion: <8536(dot)1475704230(at)sss(dot)pgh(dot)pa(dot)us>

This is why, this problem does not occur on 9.6.2, and it occurs on 10.2.

As to why? Perhaps this is a bug in the Linux kernel. However, it is not clear from which version of the Linux kernel "fixed?" this problem. The problem still occurs after upgrading the CentOS 6.5 kernel from 2.6.32-431 to 2.6.32-504.23.4.
To avoid this problem, may be the only way is upgrading the CentOS to higher version(such as CentOS 7.3).
Regards,
Chen Huajun
We have confirmed this to be a known Linux kernel bug. And fixed by the following commmit. Thanks for all help.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=13d60f4b6ab5b702dc8d2ee20999f98a93728aec
futex: Take hugepages into account when generating futex_key

Regards,
Chen Huajun

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Milorad Krstevski 2018-05-19 14:25:10 Re: BUG #15206: Can not import CSV into PostgreSQL
Previous Message Milorad Krstevski 2018-05-19 14:18:58 Re: BUG #15206: Can not import CSV into PostgreSQL