Re: Why is infinite_recurse test suddenly failing?

From: Mark Wong <mark(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Why is infinite_recurse test suddenly failing?
Date: 2019-05-14 14:59:01
Message-ID: 20190514145901.GA10216@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 10, 2019 at 11:27:07AM -0700, Andres Freund wrote:
> Hi,
>
> On 2019-05-10 11:38:57 -0400, Tom Lane wrote:
> > Core was generated by `postgres: debian regression [local] SELECT '.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0 sysmalloc (nb=8208, av=0x3fff916e0d28 <main_arena>) at malloc.c:2748
> > 2748 malloc.c: No such file or directory.
> > #0 sysmalloc (nb=8208, av=0x3fff916e0d28 <main_arena>) at malloc.c:2748
> > #1 0x00003fff915bedc8 in _int_malloc (av=0x3fff916e0d28 <main_arena>, bytes=8192) at malloc.c:3865
> > #2 0x00003fff915c1064 in __GI___libc_malloc (bytes=8192) at malloc.c:2928
> > #3 0x00000000106acfd8 in AllocSetContextCreateInternal (parent=0x1000babdad0, name=0x1085508c "inline_function", minContextSize=<optimized out>, initBlockSize=<optimized out>, maxBlockSize=8388608) at aset.c:477
> > #4 0x00000000103d5e00 in inline_function (funcid=20170, result_type=<optimized out>, result_collid=<optimized out>, input_collid=<optimized out>, funcvariadic=<optimized out>, func_tuple=<optimized out>, context=0x3fffe3da15d0, args=<optimized out>) at clauses.c:4459
> > #5 simplify_function (funcid=<optimized out>, result_type=<optimized out>, result_typmod=<optimized out>, result_collid=<optimized out>, input_collid=<optimized out>, args_p=<optimized out>, funcvariadic=<optimized out>, process_args=<optimized out>, allow_non_const=<optimized out>, context=<optimized out>) at clauses.c:4040
> > #6 0x00000000103d2e74 in eval_const_expressions_mutator (node=0x1000babe968, context=0x3fffe3da15d0) at clauses.c:2474
> > #7 0x00000000103511bc in expression_tree_mutator (node=<optimized out>, mutator=0x103d2b10 <eval_const_expressions_mutator>, context=0x3fffe3da15d0) at nodeFuncs.c:2893
>
>
> > So that lets out any theory that somehow we're getting into a weird
> > control path that misses calling check_stack_depth;
> > expression_tree_mutator does so for one, and it was called just nine
> > stack frames down from the crash.
>
> Right. There's plenty places checking it...
>
>
> > I am wondering if, somehow, the stack depth limit seen by the postmaster
> > sometimes doesn't apply to its children. That would be pretty wacko
> > kernel behavior, especially if it's only intermittently true.
> > But we're running out of other explanations.
>
> I wonder if this is a SIGSEGV that actually signals an OOM
> situation. Linux, if it can't actually extend the stack on-demand due to
> OOM, sends a SIGSEGV. The signal has that information, but
> unfortunately the buildfarm code doesn't print it. p $_siginfo would
> show us some of that...
>
> Mark, how tight is the memory on that machine?

There's about 2GB allocated:

debian(at)postgresql-debian:~$ cat /proc/meminfo
MemTotal: 2080704 kB
MemFree: 1344768 kB
MemAvailable: 1824192 kB

At the moment it looks like plenty. :) Maybe I should set something up
to monitor these things.

> Does dmesg have any other
> information (often segfaults are logged by the kernel with the code
> IIRC).

It's been up for about 49 days:

debian(at)postgresql-debian:~$ uptime
14:54:30 up 49 days, 14:59, 3 users, load average: 0.00, 0.34, 1.04

I see one line from dmesg that is related to postgres:

[3939350.616849] postgres[17057]: bad frame in setup_rt_frame: 00003fffe3d9fe00 nip 00003fff915bdba0 lr 00003fff915bde9c

But only that one time in 49 days up. Otherwise I see a half dozen
hung_task_timeout_secs messages around jdb2 and dhclient.

Regards,
Mark

--
Mark Wong
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Wong 2019-05-14 15:12:07 Re: Why is infinite_recurse test suddenly failing?
Previous Message Tom Lane 2019-05-14 14:52:07 Re: Why is infinite_recurse test suddenly failing?