Re: Is there a significant difference in Memory settings between 9.5 and 12

From: Tory M Blue <tmblue(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Is there a significant difference in Memory settings between 9.5 and 12
Date: 2020-05-12 18:16:44
Message-ID: CAEaSS0YDpb_L1Ve+4aNhNz7nSN5M+EpT9fV8AUuEYnY2tDtuDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, May 11, 2020 at 10:55 PM Tory M Blue <tmblue(at)gmail(dot)com> wrote:

>
>
> On Mon, May 11, 2020 at 9:01 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> wrote:
>
>> On Tue, May 12, 2020 at 2:52 PM Tory M Blue <tmblue(at)gmail(dot)com> wrote:
>> > It took the change but didn't help. So 10GB of shared_buffers in 12 is
>> still a no go. I'm down to 5GB and it works, but this is the same hardware,
>> the same exact 9.5 configuration. So I'm missing something. WE have not had
>> to mess with kernel memory settings since 9.4, so this is an odd one.
>> >
>> > I'll keep digging, but i'm hesitant to do my multiple TB db's with half
>> of their shared buffer configs, until I understand what 12 is doing
>> differently than 9.5
>>
>> Which exact version of 9.5.x are you coming from? What's the exact
>> error message on 12 (you showed the shared_memory_type=sysv error, but
>> with the default value (mmap) how does it look)? What's your
>> huge_pages setting?
>>
>
> 9.5-20
> postgresql95-9.5.20-2PGDG.rhel7.x86_64
> postgresql95-contrib-9.5.20-2PGDG.rhel7.x86_64
> postgresql95-libs-9.5.20-2PGDG.rhel7.x86_64
> postgresql95-server-9.5.20-2PGDG.rhel7.x86_64
>
> I don't use huge_pages
>
> And this error is actually from the default mmap
>
> May 08 12:33:58 qdb01.prod.ca postmaster[8790]: < 2020-05-08 12:33:58.324
> PDT >HINT: This error usually means that PostgreSQL's request for a
> shared memory segment exceeded available memory, swap space, or huge pages.
> To reduce the request size (currently 11026235392 bytes), reduce
> PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or
> max_connections.
>
> The above error is with 12 trying to start with shared_buffers = 10GB...
>
> 9.5 starts fine with the same configuration file. That kind of started
> me down this path.
>
> And just to repeat. Same exact hardware, same kernel, nothing more than
> installing the latest postgres12, copying my config files from 9.5 to 12
> and running the pg_upgrade.
>
> 9.5 has been running for years with the same configuration file, so
> something changed somewhere along the line that is preventing 12 to start
> with the same config file. And the allocation error is with either the
> sysv or mman on 12. (will start with 5GB allocated, but not 10GB, on a 15GB
> box (dedicated postgres server).
>
>
>> Can you reproduce the problem with a freshly created test cluster? As
>> a regular user, assuming regular RHEL packaging, something like
>> /usr/pgsql-12/bin/initdb -D test_pgdata, and then
>> /usr/pgsql-12/bin/postgres -D test_pgdata -c shared_buffers=10GB (then
>> ^C to stop it). If that fails to start in the same way, it'd be
>> interesting to see the output of the second command with strace in
>> front of it, in the part where it allocates shared memory. And
>> perhaps it'd be interesting to see the same output with
>> /usr/pgsql-9.5/bin/XXX (if you still have the packages). For example,
>> on my random dev laptop that looks like:
>>
>> openat(AT_FDCWD, "/proc/meminfo", O_RDONLY) = 6
>> fstat(6, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
>> read(6, "MemTotal: 16178852 kB\nMemF"..., 1024) = 1024
>> read(6, ": 903168 kB\nShmemHugePages: "..., 1024) = 311
>> close(6) = 0
>> mmap(NULL, 11016339456, PROT_READ|PROT_WRITE,
>> MAP_SHARED|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = -1 ENOMEM (Cannot
>> allocate memory)
>> mmap(NULL, 11016003584, PROT_READ|PROT_WRITE,
>> MAP_SHARED|MAP_ANONYMOUS, -1, 0) = 0x7ff74e579000
>> shmget(0x52e2c1, 56, IPC_CREAT|IPC_EXCL|0600) = 3244038
>> shmat(3244038, NULL, 0) = 0x7ff9df5ad000
>>
>> The output is about the same on REL9_5_STABLE and REL_12_STABLE for
>> me, only slightly different sizes. If that doesn't fail in the same
>> way on your system with 12, perhaps there are some more settings from
>> your real clusters required to make it fail. You could add them one
>> by one with -c foo=bar or in the throw away
>> test_pgdata/postgresql.conf, and perhaps that process might shed some
>> light?
>>
>> I was going to ask if it might be a preloaded extension that is asking
>> for gobs of extra memory in 12, but we can see from your "Failed
>> system call was shmget(key=5432001, size=11026235392, 03600)" that
>> it's in the same ballpark as my total above for shared_buffers=10GB.
>>
>
> Be more than happy to test this out. I'll see what I can pull tomorrow and
> provide some dataz :) I know it's not ideal to use the same config file,
> I know that various things are added or changed (usually added) but the
> defaults are typically safe. But after sometime dialing in the settings for
> our use case, I've just kind of kept moving them forward.
>
> But let me do some more testing tomorrow (since I'm trying to get to the
> bottom of this, before I attempt my big DB upgrades). So I'll spend some
> time testing and see if I can't get similar "failures/challenges"? and go
> from there.
>
> Appreciate the ideas!
>
> Tory
>

Well that is interesting. Built a new system, installed 9.5 and 12, moved
my config file in, added the include line to the standard postgresql.conf
file in each version and location
/pgsql/9.5
/pgsql/12

Edited/created custom systemctl files for each version.

And it starts.There are no errors, I can start 9.5 and 12 with my config
file that i'm attempting to use in the upgraded system.

So maybe, the upgrade is actually doing something funky. I'll do a mock
upgrade now . Loaded my data

OKAY I see something but don't understand why..

I loaded my data into 9.5 and 12 both started fine, using my 9.5 data. I
then destroyed the 12 data and ran a clean init and then performed a link
upgrade, postgrres 12 started with no issues at all, same shared_buffers
10GB.. I started scratching my head, then I remember I force some stuff via
sysctl so I added those and boom, postgres12 will no longer start with the
Shared_buffers of 10GB, but 9.5 starts.

May 12 10:59:49 ip-100-98-136-145.ca. postmaster[9975]: < 2020-05-12
10:59:49.719 PDT >FATAL: could not map anonymous shared memory: Cannot

So this appears not to be directly related to the upgrade but something
with my existing sysctl settings and postgres 12

Anyone know why these settings are causing an issue with 12?

vm.overcommit_memory = 2
This is the culprit I think. But makes no sense why postgres9.5 allowed
it and 12 does not.
vm.overcommit_memory to 2, the vm.overcommit_ratio value becomes relevant.
By default, this value is set to 50, which means the system would
only allocate up to 50% of your RAM (plus swap). (so 15GB system, 10GB
request, is more than 50% (but 9.5 worked). Setting to 1 allows it again,
but I'm a tad confused on why this is causing an issue in 12 but not 9.5
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
vm.swappiness = 0

At least I have an answer as to what, I just am not clear why.

Thanks again for the ideas!

Tory

Tory

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Francisco Olarte 2020-05-12 19:11:40 Re: System column xmin makes anonymity hard
Previous Message Johannes Linke 2020-05-12 18:00:21 System column xmin makes anonymity hard