Quick Links

Re: Anti-critical-section assertion failure in mcxt.c reached by walsender

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject:	Re: Anti-critical-section assertion failure in mcxt.c reached by walsender
Date:	2021-05-08 16:55:07
Message-ID:	20210508165507.GB3082635@rfd.leadboat.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, May 08, 2021 at 04:57:54PM +1200, Thomas Munro wrote:
> On Sat, May 8, 2021 at 2:30 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > May 07 03:31:39 gcc202 kernel: sunvdc: vdc_tx_trigger() failure, err=-11
>
> That's -EAGAIN (assuming errnos match x86) and I guess it indicates
> that VDC_MAX_RETRIES is exceeded here:
>
> https://github.com/torvalds/linux/blob/master/drivers/block/sunvdc.c#L451
> https://github.com/torvalds/linux/blob/master/drivers/block/sunvdc.c#L526
>
> One theory is that the hypervisor/host is occasionally too swamped to
> service the request queue fast enough over a ~10ms period, given that
> vio_ldc_send() itself retries 1000 times with a 1us sleep, the outer
> loop tries ten times, and ldc.c's write_nonraw() reports -EAGAIN when
> there is no space for the message. (Alternatively, it's trying to
> send a message that's too big for the channel, the channel is
> corrupted by bugs, or my fly-by of this code I'd never heard of before
> now is just way off...)

Nice discovery. From
https://github.com/torvalds/linux/commit/a11f6ca9aef989b56cd31ff4ee2af4fb31a172ec
I see those details are 2.5 years old, somewhat young relative to the driver
as a whole. I don't know which part should change, though.

In response to

Re: Anti-critical-section assertion failure in mcxt.c reached by walsender at 2021-05-08 04:57:54 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2021-05-08 17:13:47	Re: [PATCH] Identify LWLocks in tracepoints
Previous Message	Tom Lane	2021-05-08 16:39:33	Re: [PATCH] force_parallel_mode and GUC categories