Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition

From: tender wang <tndrwang(at)gmail(dot)com>
To: exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition
Date: 2023-12-26 16:55:25
Message-ID: CAHewXNnayN3NM1HfaOCejk=sGfSva6ZDArWxKiTxL7PdDHRtMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I tried to analyze the issue, and I found that it might be caused by this
commit:
commit dad50f677c42de207168a3f08982ba23c9fc6720
bufmgr: Acquire and clean victim buffer separately

Before this dad50f677 commit, the LocalBufferAlloc() will do below
operation:
/*
* it's all ours now.
*/
bufHdr->tag = newTag;
buf_state &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | BM_IO_ERROR);
buf_state |= BM_TAG_VALID;
buf_state &= ~BUF_USAGECOUNT_MASK;
buf_state += BUF_USAGECOUNT_ONE;

Now after dad50f677, GetLocalVictimBuffer() doesn't do above operations,
so my reported issue will happen.
In my reported issue:
f 3
(gdb) p /x buf_state
$1 = 0x1000000

In GetLocalVictimBuffer(), buf_state has no choices to do: buf_state &=
~(BUF_FLAG_MASK | BUF_USAGECOUNT_MASK);

I try to fix this issue in attached patch according to LocalBufferAlloc()
logic, but I'm not 100% understanded all detailed about bufmgr.
So any thoughts?

tender wang <tndrwang(at)gmail(dot)com> 于2023年12月26日周二 18:51写道:

> Thanks for the report. I can reproduce your reported bug on master. And I
> find another assert failed when run below SQL:
>
> psql (17devel)
> Type "help" for help.
>
> postgres=# CREATE UNLOGGED TABLE filler(a int, b text STORAGE plain);
> CREATE TABLE
> postgres=# INSERT INTO filler SELECT g, repeat('x', 1000) FROM
> generate_series(1,
> postgres(# 50000) g;
> INSERT 0 50000
> postgres=# CREATE TEMP TABLE tbl(a int);
> CREATE TABLE
> postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
> ERROR: could not extend file "base/5/t3_16389": No space left on device
> HINT: Check free disk space.
> postgres=# DROP TABLE filler;
> DROP TABLE
> postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
> INSERT 0 200000
> postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
>
(gdb) bt
> #0 __GI_raise (sig=sig(at)entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1 0x00007f9d3d8b1859 in __GI_abort () at abort.c:79
> #2 0x000055f83501c868 in ExceptionalCondition
> (conditionName=0x55f8351fcb78 "!(buf_state & (BM_VALID | BM_TAG_VALID |
> BM_DIRTY | BM_JUST_DIRTIED))", fileName=0x55f8351fca4b "localbuf.c",
> lineNumber=402) at assert.c:66
> #3 0x000055f834df05ab in ExtendBufferedRelLocal (bmr=...,
> fork=MAIN_FORKNUM, flags=8, extend_by=1, extend_upto=4294967295,
> buffers=0x7ffff3ed1530, extended_by=0x7ffff3ed13fc)
> at localbuf.c:402
> #4 0x000055f834de7a0a in ExtendBufferedRelCommon (bmr=...,
> fork=MAIN_FORKNUM, strategy=0x0, flags=8, extend_by=1,
> extend_upto=4294967295, buffers=0x7ffff3ed1530, extended_by=0x7ffff3ed14dc)
> at bufmgr.c:1828
> #5 0x000055f834de6393 in ExtendBufferedRelBy (bmr=..., fork=MAIN_FORKNUM,
> strategy=0x0, flags=8, extend_by=1, buffers=0x7ffff3ed1530,
> extended_by=0x7ffff3ed14dc) at bufmgr.c:889
> #6 0x000055f83492a240 in RelationAddBlocks (relation=0x7f9d325a7648,
> bistate=0x0, num_pages=1, use_fsm=true, did_unlock=0x7ffff3ed168d) at
> hio.c:342
> #7 0x000055f83492ab67 in RelationGetBufferForTuple
> (relation=0x7f9d325a7648, len=32, otherBuffer=0, options=0, bistate=0x0,
> vmbuffer=0x7ffff3ed1714, vmbuffer_other=0x0, num_pages=1)
> at hio.c:768
> #8 0x000055f834910840 in heap_insert (relation=0x7f9d325a7648,
> tup=0x55f83786e898, cid=0, options=0, bistate=0x0) at heapam.c:1853
> #9 0x000055f834920cc0 in heapam_tuple_insert (relation=0x7f9d325a7648,
> slot=0x55f83786e808, cid=0, options=0, bistate=0x0) at heapam_handler.c:252
> #10 0x000055f834bd582a in table_tuple_insert (rel=0x7f9d325a7648,
> slot=0x55f83786e808, cid=0, options=0, bistate=0x0) at
> ../../../src/include/access/tableam.h:1400
> #11 0x000055f834bd7859 in ExecInsert (context=0x7ffff3ed1970,
> resultRelInfo=0x55f836fe5ed0, slot=0x55f83786e808, canSetTag=true,
> inserted_tuple=0x0, insert_destrel=0x0)
> at nodeModifyTable.c:1133
> #12 0x000055f834bdbbae in ExecModifyTable (pstate=0x55f836fe5cc0) at
> nodeModifyTable.c:3806
> #13 0x000055f834b9a6cb in ExecProcNodeFirst (node=0x55f836fe5cc0) at
> execProcnode.c:464
> #14 0x000055f834b8db69 in ExecProcNode (node=0x55f836fe5cc0) at
> ../../../src/include/executor/executor.h:273
> #15 0x000055f834b9096f in ExecutePlan (estate=0x55f836fe5a30,
> planstate=0x55f836fe5cc0, use_parallel_mode=false, operation=CMD_INSERT,
> sendTuples=false, numberTuples=0,
> direction=ForwardScanDirection, dest=0x55f836ff4378,
> execute_once=true) at execMain.c:1670
> #16 0x000055f834b8e20f in standard_ExecutorRun (queryDesc=0x55f836f35a20,
> direction=ForwardScanDirection, count=0, execute_once=true) at
> execMain.c:365
> #17 0x000055f834b8e033 in ExecutorRun (queryDesc=0x55f836f35a20,
> direction=ForwardScanDirection, count=0, execute_once=true) at
> execMain.c:309
> #18 0x000055f834e3f27a in ProcessQuery (plan=0x55f836ff4218,
> sourceText=0x55f836f0b4b0 "INSERT INTO tbl SELECT g FROM generate_series(1,
> 200000) g;", params=0x0, queryEnv=0x0,
> dest=0x55f836ff4378, qc=0x7ffff3ed1dd0) at pquery.c:160
> #19 0x000055f834e40d99 in PortalRunMulti (portal=0x55f836f86a00,
> isTopLevel=true, setHoldSnapshot=false, dest=0x55f836ff4378,
> altdest=0x55f836ff4378, qc=0x7ffff3ed1dd0) at pquery.c:1277
> #20 0x000055f834e402bf in PortalRun (portal=0x55f836f86a00,
> count=9223372036854775807, isTopLevel=true, run_once=true,
> dest=0x55f836ff4378, altdest=0x55f836ff4378, qc=0x7ffff3ed1dd0)
> at pquery.c:791
> #21 0x000055f834e39478 in exec_simple_query (query_string=0x55f836f0b4b0
> "INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;") at
> postgres.c:1273
> #22 0x000055f834e3e105 in PostgresMain (dbname=0x55f836f42870 "postgres",
> username=0x55f836f42858 "gpadmin") at postgres.c:4653
> #23 0x000055f834d63393 in BackendRun (port=0x55f836f39fd0) at
> postmaster.c:4422
> #24 0x000055f834d62a4c in BackendStartup (port=0x55f836f39fd0) at
> postmaster.c:4101
> #25 0x000055f834d5f358 in ServerLoop () at postmaster.c:1769
> #26 0x000055f834d5ec7e in PostmasterMain (argc=3, argv=0x55f836f05b80) at
> postmaster.c:1468
> #27 0x000055f834c1525d in main (argc=3, argv=0x55f836f05b80) at main.c:198
>
> PG Bug reporting form <noreply(at)postgresql(dot)org> 于2023年12月26日周二 17:32写道:
>
>> The following bug has been logged on the website:
>>
>> Bug reference: 18259
>> Logged by: Alexander Lakhin
>> Email address: exclusion(at)gmail(dot)com
>> PostgreSQL version: 16.1
>> Operating system: Ubuntu 22.04
>> Description:
>>
>> The following script:
>> mkdir /tmp/100m
>> sudo mount -t tmpfs -o size=100M tmpfs /tmp/100m
>> export PGDATA=/tmp/100m/tmpdb
>>
>> initdb
>> pg_ctl -l server.log start
>>
>> cat << 'EOF' | psql
>> CREATE UNLOGGED TABLE filler(a int, b text STORAGE plain);
>> INSERT INTO filler SELECT g, repeat('x', 1000) FROM generate_series(1,
>> 50000) g;
>> CREATE TEMP TABLE tbl(a int);
>> INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
>> INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
>> DROP TABLE filler;
>> INSERT INTO tbl SELECT g from generate_series(1, 200000) g;
>> EOF
>>
>> triggers an assertion failure following "no space left" errors:
>> ...
>> CREATE TABLE
>> ERROR: could not extend file "base/5/t3_16391": No space left on device
>> HINT: Check free disk space.
>> ERROR: could not extend file "base/5/t3_16391": No space left on device
>> HINT: Check free disk space.
>> DROP TABLE
>> server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> connection to server was lost
>> TRAP: failed Assert("buf_state & BM_TAG_VALID"), File: "localbuf.c", Line:
>> 390, PID: 25978
>>
>> The call stack of the failure is:
>> ExtendBufferedRelLocal at localbuf.c:391:4
>> ExtendBufferedRelCommon at bufmgr.c:1801:17
>> ExtendBufferedRelBy at bufmgr.c:862:9
>> RelationAddBlocks at hio.c:342:16
>> RelationGetBufferForTuple at hio.c:768:11
>> heap_insert at heapam.c:1862:11
>> heapam_tuple_insert at heapam_handler.c:253:2
>> table_tuple_insert at tableam.h:1402:1
>> ExecInsert at nodeModifyTable.c:1138:21
>> ExecModifyTable at nodeModifyTable.c:3810:12
>> ExecProcNodeFirst at execProcnode.c:465:1
>> ExecProcNode at executor.h:274:1
>> ExecutePlan at execMain.c:1670:10
>> standard_ExecutorRun at execMain.c:365:3
>> ExecutorRun at execMain.c:310:1
>> ProcessQuery at pquery.c:165:5
>> PortalRunMulti at pquery.c:1277:5
>> PortalRun at pquery.c:795:5
>> exec_simple_query at postgres.c:1274:10
>> PostgresMain at postgres.c:4641:27
>> ExitPostmaster at postmaster.c:5047:1
>> BackendStartup at postmaster.c:4196:5
>> ServerLoop at postmaster.c:1788:6
>> PostmasterMain at postmaster.c:1466:11
>>
>> The first bad commit for this anomaly is 31966b15 (and exactly that commit
>> added the Assert).
>>
>> With debug logging added in this code within ExtendBufferedRelLocal():
>> if (found)
>> {
>> BufferDesc *existing_hdr =
>> GetLocalBufferDescriptor(hresult->id);
>> uint32 buf_state;
>>
>> UnpinLocalBuffer(BufferDescriptorGetBuffer(victim_buf_hdr));
>>
>> existing_hdr = GetLocalBufferDescriptor(hresult->id);
>> PinLocalBuffer(existing_hdr, false);
>> buffers[i] = BufferDescriptorGetBuffer(existing_hdr);
>>
>> buf_state = pg_atomic_read_u32(&existing_hdr->state);
>> Assert(buf_state & BM_TAG_VALID);
>> Assert(!(buf_state & BM_DIRTY));
>> buf_state &= BM_VALID;
>> pg_atomic_unlocked_write_u32(&existing_hdr->state, buf_state);
>> ...
>> I see that it reached for the second INSERT (and NOSPC error) with
>> existing_hdr->state == 0x2040000, but for the third INSERT I observe
>> state == 0x0.
>>
>>

Attachment Content-Type Size
0001-Fix-local-buf_state-error.patch application/octet-stream 1.6 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Lakhin 2023-12-27 07:00:00 Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition
Previous Message Peter Eisentraut 2023-12-26 14:42:30 Re: BUG #18252: Assert in CheckOpSlotCompatibility() fails when recursive union filters tuples in non-recursive term