Re: pgbench error: (setshell) of script 0; execution of meta-command failed

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench error: (setshell) of script 0; execution of meta-command failed
Date: 2025-01-10 14:39:13
Message-ID: b0afe844-84b7-4154-ad2c-6763ff84d876@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2025/01/10 21:41, Fujii Masao wrote:
>
>
> On 2025/01/10 16:09, Andy Fan wrote:
>> Andy Fan <zhihuifan1213(at)163(dot)com> writes:
>>
>>> Hi:
>>>
>>> I run into the {subject} issue with the below setup.
>>>
>>> cat foo.sql
>>>
>>> \setshell txn_mode echo ${TXN_MODE}
>>> \setshell speed echo ${SPEED}
>>> \setshell sleep_ms echo ${SLEEP_MS}
>>> \setshell subtxn_mode echo ${SUBTXN_MODE}
>>>
>>> select 1;
>>>
>>> $ TXN_MODE=-1 SPEED=1 SLEEP_MS=0 SUBTXN_MODE=-1 pgbench -n -ffoo.sql postgres -T5 -c4 --exit-on-abort
>>>
>>> I *randomly*(7/8) get errors like:
>>>
>>> pgbench (18devel)
>>> pgbench: error: client 2 aborted in command 0 (setshell) of script 0; execution of meta-command failed
>>> pgbench: error: Run was aborted due to an error in thread 0
>
> Interestingly, my git bisect pointed to the following commit
> as the cause of this issue, even though it seems unrelated to
> the pgbench problem at all. It’s possible my git bisect result
> is incorrect, but when I reverted this commit on HEAD,
> the pgbench issue didn’t occur during my tests.
>
> ----------------------
> 06843df4abc5a0c24e4bd154a8a1327e074fa3ae is the first bad commit
> commit 06843df4abc5a0c24e4bd154a8a1327e074fa3ae
> Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Date:   Fri Sep 29 14:07:30 2023 -0400
>
>     Suppress macOS warnings about duplicate libraries in link commands.
> ----------------------

Before this commit, pgbench used pqsignal() from port/pqsignal.c
to set the signal handler for SIGALRM. This version of pqsignal()
sets SA_RESTART for frontend code, so fgets() in runShellCommand()
wouldn't return NULL even if SIGALRM arrived during fgets(),
preventing the reported error.

On the other hand, currently, pgbench seems to use pqsignal()
from legacy-pqsignal.c, which doesn't set SA_RESTART for SIGALRM.
As a result, SIGALRM can interrupt fgets() in runShellCommand()
and make it return NULL, leading to the reported error.

I'm not sure if this change was an intentional result of that commit...

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladlen Popolitov 2025-01-10 14:49:01 Re: Some ExecSeqScan optimizations
Previous Message vignesh C 2025-01-10 14:37:50 Re: Logical Replication of sequences