From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) |
Date: | 2023-01-26 21:06:45 |
Message-ID: | CA+hUKGLtVM4-qxtXMHYp9hjwPdhJSnvBrVfZtAPyizsuGydkAA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jan 27, 2023 at 9:57 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Fri, Jan 27, 2023 at 9:49 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:
> > > I received an alert dikkop (my rpi4 buildfarm animal running freebsd 14)
> > > did not report any results for a couple days, and it seems it got into
> > > an infinite loop in REL_11_STABLE when building hash table in a parallel
> > > hashjoin, or something like that.
> >
> > > It seems to be progressing now, probably because I attached gdb to the
> > > workers to get backtraces, which does signals etc.
> >
> > That reminds me of cases that I saw several times on my now-deceased
> > animal florican:
> >
> > https://www.postgresql.org/message-id/flat/2245838.1645902425%40sss.pgh.pa.us
> >
> > There's clearly something rotten somewhere in there, but whether
> > it's our bug or FreeBSD's isn't clear.
>
> And if it's ours, it's possibly in latch code and not anything higher
> (I mean, not in condition variables, barriers, or parallel hash join)
> because I saw a similar hang in the shm_mq stuff which uses the latch
> API directly. Note that 13 switched to kqueue but still used the
> self-pipe, and 14 switched to a signal event, and this hasn't been
> reported in those releases or later, which makes the poll() code path
> a key suspect.
Also, 14 changed the flag/memory barrier dance (maybe_sleeping), but
13 did it the same way as 11 + 12. So between 12 and 13 we have just
the poll -> kqueue change.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-01-26 21:09:51 | Re: suppressing useless wakeups in logical/worker.c |
Previous Message | Peter Geoghegan | 2023-01-26 21:06:31 | Re: New strategies for freezing, advancing relfrozenxid early |