BUG #18385: Assert("strategy_delta >= 0") in BgBufferSync() fails due to race condition

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: exclusion(at)gmail(dot)com
Subject: BUG #18385: Assert("strategy_delta >= 0") in BgBufferSync() fails due to race condition
Date: 2024-03-10 19:00:00
Message-ID: 18385-eaccb1d63684c704@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18385
Logged by: Alexander Lakhin
Email address: exclusion(at)gmail(dot)com
PostgreSQL version: 16.2
Operating system: Ubuntu 22.04
Description:

With a small shared_buffers value and a short bgwriter_delay:
shared_buffers = '1MB'
bgwriter_delay = 10

processing concurrent writing workload like:
pgbench -i
pgbench -t 10000 -c 40

on a slow machine leads to:
number of transactions actually processed: 103642/400000
...
tps = 187.796284 (without initial connection time)
pgbench: error: Run was aborted; the above results are incomplete.
...
TRAP: failed Assert("strategy_delta >= 0"), File: "bufmgr.c", Line: 2836,
PID: 20941
postgres: background writer (ExceptionalCondition+0x52)[0x5581a8dd1677]
postgres: background writer (BgBufferSync+0xb6)[0x5581a8c5b97a]
postgres: background writer (BackgroundWriterMain+0x20b)[0x5581a8bf117a]
postgres: background writer (AuxiliaryProcessMain+0x175)[0x5581a8befa29]
postgres: background writer (+0x423cff)[0x5581a8bf5cff]
postgres: background writer (PostmasterMain+0x1127)[0x5581a8bf916f]
postgres: background writer (main+0x227)[0x5581a8b1d4d5]

To ease reproduction, adding the following delay is recommended:
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -417,2 +417,3 @@ StrategySyncStart(...)
}
+pg_usleep(300);
SpinLockRelease(&StrategyControl->buffer_strategy_lock);

(Initially observed during the test 027_stream_regress (which runs with
shared_buffers = '1MB') with the minimal bgwriter_delay on a slow
dual-core machine, where one test run takes around 1000 sec.)

Reproduced on REL_12_STABLE .. master. In fact, the issue reproduced
starting from d72731a70.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tomas Vondra 2024-03-10 20:43:11 Re: Undetected deadlock between client backend and startup processes on a standby (Previously, Undetected deadlock between primary and standby processes)
Previous Message Hayato Kuroda (Fujitsu) 2024-03-09 04:50:02 RE: RE: Re:BUG #18369: logical decoding core on AssertTXNLsnOrder()