From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Vladlen Popolitov <v(dot)popolitov(at)postgrespro(dot)ru> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, hukutoc(at)gmail(dot)ru, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nikita Malakhov <hukutoc(at)gmail(dot)com> |
Subject: | Re: Current master hangs under the debugger after Parallel Seq Scan (Linux, MacOS) |
Date: | 2025-03-26 18:22:14 |
Message-ID: | 3kp64koynvdzepbyddpkel7dugnku7ksfevkovx3rrrsle4dcp@ah7gla44mxjh |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-03-26 21:53:35 +0700, Vladlen Popolitov wrote:
> During debug session I found, that queries with Parallel Seq Scan hang
> in the current master - the leader worker waits indefinitely the signal
> from parallel workers. A query is not possible to break, the leader
> does not check interrupt status in the waiting loop.
>
> 1. How to reproduce:
> a) Create table:
>
> CREATE DATABASE expr;
> \c expr
> CREATE TABLE testexpr(
> id INT,
> val INT
> );
> INSERT INTO testexpr (id, val)
> SELECT serie as id , MOD(serie, 10) as val
> FROM generate_series(1,1000000) as serie;
> EXPLAIN (ANALYZE) SELECT * FROM testexpr
> WHERE val=1 AND id<30;
>
> b) start debugger for this connection
>
> c) Run command (parallel workers should be enabled as it is by default
> configuration)
> EXPLAIN (ANALYZE) SELECT * FROM testexpr
> WHERE val=1 AND id<30;
>
> d) Above query will start parallel worker(s). When worker(s) finish(es),
> it/they send SIGUSR1 that is caught by debugger. When you dimiss
> the signal message, you find that query continues to run, but really it
> waits (in latch.c or in waiteventset.c depending on commit version).
Isn't that to be expected? If I understand correctly, the way your gdb is
configured is that it intercepts SIGUSR1 signals *without* passing it on to
the application (i.e. postgres). We rely on the signal to be delivered. Which
it isn't. Thus a hang.
At least my gdb doesn't intercept SIGUSR1 by default. It's a newer gdb though,
so that could have been different in the past (although I don't remember a
different behaviour).
(gdb) handle SIGUSR1
Signal Stop Print Pass to program Description
SIGUSR1 No No Yes User defined signal 1
If I change the configuration to not pass it, but print it, I can reproduce a
hang:
handle SIGUSR1 print nopass
What does your gdb show for "handle SIGUSR1"? If it isn't what I reported, is
it possible that you set that in your .gdbinit or such?
> 2. Original commit with reproducible behaviour.
> I tracked this behaviour down to commit
> > commit 7202d72787d3b93b692feae62ee963238580c877
> > Date: Fri Feb 21 08:03:33 2025 +0100
> > backend launchers void * arguments for binary data
> > Change backend launcher functions to take void * for binary data
> > instead of char *. This removes the need for numerous casts.
> > Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
I also find it very hard to believe that this commit introduced this problem -
it doesn't sound like a postgres issue to me. I can reproduce it in PG 16,
after doing "handle SIGUSR1 print nopass".
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2025-03-26 18:31:02 | Re: AIO v2.5 |
Previous Message | 杨江华 | 2025-03-26 18:21:06 | Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available |