Re: BUG #18385: Assert("strategy_delta >= 0") in BgBufferSync() fails due to race condition

From: Tender Wang <tndrwang(at)gmail(dot)com>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18385: Assert("strategy_delta >= 0") in BgBufferSync() fails due to race condition
Date: 2024-03-15 08:48:34
Message-ID: CAHewXNm0YG1ew5ZW=yEi2B6gAu5kEEDVh2CRoVQX+MObUgC-hA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Sorry, the second solution is still not fix the problem. I met Assert
failed:
strategy_passes is 100964, but prev_strategy_passes is 100965. The added
check was not satisfied.
Would it happen that current strategy_passes is less than
prev_strategy_passes.

Tender Wang <tndrwang(at)gmail(dot)com> 于2024年3月15日周五 16:05写道:

>
>
> Alexander Lakhin <exclusion(at)gmail(dot)com> 于2024年3月13日周三 16:00写道:
>
>> Hi Tender Wang,
>>
>> 13.03.2024 09:50, Tender Wang wrote:
>> > Hi Alexander,
>> > I haven't been able to reproduce this issue on my machine(2 vCPU,
>> 2GB memory).
>> >
>> > Can you reproduce this issue reliably on your machine?
>>
>> Thanks for your attention to this issue!
>>
>> On my 12-core workstation, where pgbench shows approximately 2500 tps,
>> `pgbench -t 10000 -c 40` failed on iterations 1, 1, 3:
>> ...
>> number of transactions actually processed: 398639/400000
>> number of failed transactions: 0 (0.000%)
>> latency average = 15.931 ms
>> initial connection time = 56.806 ms
>> tps = 2510.845701 (without initial connection time)
>> pgbench: error: Run was aborted; the above results are incomplete.
>> ...
>>
>> The server built with gcc,
>> CPPFLAGS="-O0" ./configure --enable-debug --enable-cassert ...
>>
>> I have also intensified bgwriter as follows:
>> rc = WaitLatch(MyLatch,
>> WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
>> - BgWriterDelay /* ms */ ,
>> WAIT_EVENT_BGWRITER_MAIN);
>> + 1 /* ms */ , WAIT_EVENT_BGWRITER_MAIN);
>>
>> That is, I have the attached modification applied and the following in my
>> extra.config:
>> fsync = off
>> autovacuum = off
>> shared_buffers = '1MB'
>>
>
> Using your provided patch and configuration, I can reproduce this issue
> easily on my mashine(2 vCPU, 2GB).
> I went throug the StrategySyncStart() and ClockSweepTick() code. After
> d72731a70 commit, operation on StrategyControl->nextVictimBuffer
> doesn't need to get the StrategyControl->buffer_strategy_lock, but
> StrategyControl->completePasses still needs lock.
>
> Before d72731a70, if bgwriter gets the
> StrategyControl->buffer_strategy_lock spinlock in StrategySyncStart(),
> backends can't add StrategyControl->nextVictimBuffer,
> then bgwriter can get a consistent value consisting of nextVictimBuffer
> and completePasses.
>
> But now, even though bgwriter gets the spinlock, backends also could add
> the StrategyControl->nextVictimBuffer. In corner cases, bgwriter will see a
> litter StrategyControl->nextVictimBuffer
> value due to StrategyControl->nextVictimBuffer wraparound. But
> StrategyControl->completePasses didn't update. So the
> Assert(strategy_delta >= 0) will trigger.
>
> I have two solutions in my head:
> 1. remove the Assert, but I'm not srue bgwriter write strategy should
> change if strategy_delta < 0.
> 2. add more check in BgBufferSync() just like below:
> /*
> * Since nextVictimBuffer in StrategyControl has been atomic.
> * So its operation would not need to get buffer_strategy_lock.
> * In extreme circumstances, StrategySyncStart would see not consistent
> * value consisting of nextVictimBuffer and completePasses.
> * So we add one to passes_delta to make strategy_delta >= 0.
> */
> if (passes_delta == 0 && strategy_delta < 0)
> passes_delta++;
>
> I test two times using the second solution, not trigger Assert again. Need
> more times to test.
>
> --
> Tender Wang
> OpenPie: https://en.openpie.com/
>

--
Tender Wang
OpenPie: https://en.openpie.com/

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Arnd Baranowski 2024-03-15 09:54:34 Re: PostgreSQL & latest Mac OS Sonoma, a possible bug / configuration issue
Previous Message Tender Wang 2024-03-15 08:05:45 Re: BUG #18385: Assert("strategy_delta >= 0") in BgBufferSync() fails due to race condition