Re: conchuela timeouts since 2021-10-09 system upgrade

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: conchuela timeouts since 2021-10-09 system upgrade
Date: 2021-11-14 01:24:54
Message-ID: CA+hUKGL0Wdp9PTtJA-9OE8-vrE=Y=pkiK9raHVNLKJszBguJCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sun, Nov 14, 2021 at 1:17 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Sat, Nov 13, 2021 at 11:47:43PM +0500, Andrey Borodin wrote:
> > I've adapted 7f580aa to functionality of REL_11 using "\if 0 = :client_id" metacommand.
> > I really do not like idea of supporting background_pgbench() in older branches without counterpart in newer branches.
> > But so far I didn't come up with some clever mutex idea for REL_10.
>
> That's a reasonable sentiment, but removing background_pgbench() isn't going
> to fix 017_shm.pl. I'm not enthusiastic about any fix that repairs
> src/bin/pgbench without repairing 017_shm.pl. I'm okay with skipping affected
> test files on dragonfly >= 6 if you decide to cease figuring out how to make
> them pass like the others do.

Hmm, so if "IPC::Run got stuck when it should have been reaping that
zombie", what's it stuck in, I guess select() or waitpid()? Maybe
there' s a kernel bug but it seems hard to believe that a Unix system
would have bugs in such fundamental facilities and still be able to
build itself and ship a release... Otherwise I guess Perl, or perl
scripts, would need to be confusing fds or pids or something? But
that's hard to believe on its own, too, given the lack of problems on
other systems that are pretty similar. If Andrey can still reproduce
this, it'd be interesting to see a gdb backtrace, and also "ps O
wchan" or perhaps kill -INFO $pid, and lsof for the process (or
according to old pages found with google, perhaps the equivalent tool
is "fstat" on that system).

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Дмитрий Иванов 2021-11-14 10:10:32 pg_restore depending on user functions
Previous Message Alexander Lakhin 2021-11-14 01:00:00 Re: BUG #17284: Assert failed in SerialAdd() when the summarize_serial mode is engaged