Re: segfault tied to "IS JSON predicate" commit

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: segfault tied to "IS JSON predicate" commit
Date: 2023-04-15 21:46:14
Message-ID: ZDsbJvvoevy0QxCs@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 13, 2023 at 09:14:01PM -0700, Peter Geoghegan wrote:
> I find that if I run the following test against a standard debug build
> on HEAD, my local installation reliably segfaults:
>
> $ meson test --setup running --suite test_rls_hooks-running
>
> Attached is a "bt full" run from gdb against a core dump. The query
> "EXPLAIN (costs off) SELECT * FROM rls_test_permissive;" runs when the
> backend segfaults.
>
> The top frame of the back trace is suggestive of a use-after-free:
>
> #0 copyObjectImpl (from=0x7f7f7f7f7f7f7f7e) at copyfuncs.c:187
> 187 switch (nodeTag(from))
> ...
>
> "git bisect" suggests that the problem began at commit 6ee30209,
> "SQL/JSON: support the IS JSON predicate".
>
> It's a bit surprising that the bug reproduces when I run a standard
> test, and yet we appear to have a bug that's about 2 weeks old. There
> may be something unusual about my system that will turn out to be
> relevant -- though there is nothing particularly exotic about this
> machine. My repro doesn't rely on concurrent execution, or timing, or
> anything like that -- it's quite reliable.

I was able to reproduce this yesterday but not today.

I think what happened is that you (and I) are in the habbit of running
"meson test tmp_install" to compile new binaries and install them into
./tmp_install, and then run a server out from there. But nowadays
there's also "meson test install_test_files". I'm not sure what
combination of things are out of sync, but I suspect you forgot one of
0) compile *and* install the new binaries; or 1) restart the running
postmaster; or, 2) install the new shared library ("test files").

I saw the crash again when I did this:

time ninja
time meson test tmp_install install_test_files regress/regress # does not recompile, BTW
./tmp_install/usr/local/pgsql/bin/postgres -D ./testrun/regress/regress/tmp_check/data -p 5678 -c autovacuum=no&
git checkout HEAD~222
time meson test tmp_install install_test_files
time PGPORT=5678 meson test --setup running test_rls_hooks-running/regress

In this case, I'm not sure if there's anything to blame meson for; the
issue is running server, which surely has different structures since
last month.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-04-15 21:48:58 Re: Direct I/O
Previous Message shaurya jain 2023-04-15 21:10:57 Logical replication failed with SSL SYSCALL error