Re: Segmentation fault - PostgreSQL 17.0

From: Ľuboslav Špilák <lspilak(at)microstep-hdo(dot)sk>
To: Tomas Vondra <tomas(at)vondra(dot)me>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Segmentation fault - PostgreSQL 17.0
Date: 2024-11-11 15:20:22
Message-ID: VI1PR02MB6333FCC05698FC2FF8D967728A582@VI1PR02MB6333.eurprd02.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello.

I had similar ly created table in a different schema, so there were truly 2 rows in the given select (but the 2nd one was created to test the problem), so even after removing one of them the problem still persists.

select * from pg_class where relname='test_idxbrin';
"oid","relname","relnamespace","reltype","reloftype","relowner","relam","relfilenode","reltablespace","relpages","reltuples","relallvisible","reltoastrelid","relhasindex","relisshared","relpersistence","relkind","relnatts","relchecks","relhasrules","relhastriggers","relhassubclass","relrowsecurity","relforcerowsecurity","relispopulated","relreplident","relispartition","relrewrite","relfrozenxid","relminmxid","relacl","reloptions","relpartbound"
1128187015,test_idxbrin,2200,0,0,10,3580,1128187015,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32},
1128178819,test_idxbrin,16830,0,0,10,3580,1128178819,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32},

So we removed one of the tables with this index and now this select returned one row

select * from pg_class where relname='test_idxbrin';
"oid","relname","relnamespace","reltype","reloftype","relowner","relam","relfilenode","reltablespace","relpages","reltuples","relallvisible","reltoastrelid","relhasindex","relisshared","relpersistence","relkind","relnatts","relchecks","relhasrules","relhastriggers","relhassubclass","relrowsecurity","relforcerowsecurity","relispopulated","relreplident","relispartition","relrewrite","relfrozenxid","relminmxid","relacl","reloptions","relpartbound"
1128178819,test_idxbrin,16830,0,0,10,3580,1128178819,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32},

Then we called the problematic function again and it crashed.

Program received signal SIGSEGV, Segmentation fault.
0x00005627752205d5 in heap_compute_data_size (tupleDesc=tupleDesc(at)entry=0x5627a1db6a50, values=values(at)entry=0x7fff4744a450, isnull=isnull(at)entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:234
234 ./build/../src/backend/access/common/heaptuple.c: No such file or directory.
(gdb) bt
#0 0x00005627752205d5 in heap_compute_data_size (tupleDesc=tupleDesc(at)entry=0x5627a1db6a50, values=values(at)entry=0x7fff4744a450, isnull=isnull(at)entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:234
#1 0x0000562775221e4f in heap_form_minimal_tuple (tupleDescriptor=0x5627a1db6a50, values=values(at)entry=0x7fff4744a450, isnull=isnull(at)entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:1492
#2 0x00005627756f0e45 in tuplestore_putvalues (state=0x5627a1db6e58, tdesc=<optimized out>, values=values(at)entry=0x7fff4744a450, isnull=isnull(at)entry=0x7fff4744a448)
at ./build/../src/backend/utils/sort/tuplestore.c:756
#3 0x00007fc7e2d8e9eb in brin_page_items (fcinfo=<optimized out>) at ./build/../contrib/pageinspect/brinfuncs.c:300
#4 0x00005627753d435c in ExecMakeTableFunctionResult (setexpr=0x5627a1dac480, econtext=0x5627a1dac368, argContext=<optimized out>, expectedDesc=0x5627a1dad5f0, randomAccess=false)
at ./build/../src/backend/executor/execSRF.c:234
#5 0x00005627753e527a in FunctionNext (node=node(at)entry=0x5627a1dac160) at ./build/../src/backend/executor/nodeFunctionscan.c:93
#6 0x00005627753d4df9 in ExecScanFetch (recheckMtd=0x5627753e4f50 <FunctionRecheck>, accessMtd=0x5627753e4f80 <FunctionNext>, node=0x5627a1dac160)
at ./build/../src/backend/executor/execScan.c:131
#7 ExecScan (node=0x5627a1dac160, accessMtd=0x5627753e4f80 <FunctionNext>, recheckMtd=0x5627753e4f50 <FunctionRecheck>) at ./build/../src/backend/executor/execScan.c:180
#8 0x00005627753cb7bb in ExecProcNode (node=0x5627a1dac160) at ./build/../src/include/executor/executor.h:274
#9 ExecutePlan (execute_once=<optimized out>, dest=0x5627a1c89478, direction=<optimized out>, numberTuples=200, sendTuples=<optimized out>, operation=CMD_SELECT,
use_parallel_mode=<optimized out>, planstate=0x5627a1dac160, estate=0x5627a1dabf48) at ./build/../src/backend/executor/execMain.c:1648
#10 standard_ExecutorRun (queryDesc=0x5627a1cdf700, direction=<optimized out>, count=200, execute_once=<optimized out>) at ./build/../src/backend/executor/execMain.c:365
#11 0x000056277557966e in PortalRunSelect (portal=0x5627a1d43188, forward=<optimized out>, count=200, dest=<optimized out>) at ./build/../src/backend/tcop/pquery.c:924
#12 0x000056277557a9b6 in PortalRun (portal=0x5627a1d43188, count=200, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x5627a1c89478, altdest=0x5627a1c89478,
qc=0x7fff4744aa60) at ./build/../src/backend/tcop/pquery.c:768
#13 0x000056277557817e in PostgresMain () at ./build/../src/backend/tcop/postgres.c:2255
#14 0x0000562775573423 in BackendMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ./build/../src/backend/tcop/backend_startup.c:105
#15 0x00005627754e366e in postmaster_child_launch (child_type=child_type(at)entry=B_BACKEND, startup_data=startup_data(at)entry=0x7fff4744ad70 "",
startup_data_len=startup_data_len(at)entry=4, client_sock=client_sock(at)entry=0x7fff4744ad90) at ./build/../src/backend/postmaster/launch_backend.c:277
#16 0x00005627754e7229 in BackendStartup (client_sock=0x7fff4744ad90) at ./build/../src/backend/postmaster/postmaster.c:3593
#17 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1674
#18 0x00005627754e8dbd in PostmasterMain (argc=<optimized out>, argv=0x5627a1c82f10) at ./build/../src/backend/postmaster/postmaster.c:1372
#19 0x0000562775212df0 in main (argc=5, argv=0x5627a1c82f10) at ./build/../src/backend/main/main.c:197
(gdb) print i
$1 = 6
(gdb) print numberOfAttributes
$2 = <optimized out>
(gdb) print *tupleDesc
$3 = {natts = 7, tdtypeid = 2249, tdtypmod = 0, tdrefcount = -1, constr = 0x0, attrs = 0x5627a1db6a68}
(gdb) print *atti
$4 = {attrelid = 0, attname = {data = "value", '\000' <repeats 58 times>}, atttypid = 25, attlen = -1, attnum = 7, attcacheoff = -1, atttypmod = -1, attndims = 0,
attbyval = false, attalign = 105 'i', attstorage = 120 'x', attcompression = 0 '\000', attnotnull = false, atthasdef = false, atthasmissing = false, attidentity = 0 '\000',
attgenerated = 0 '\000', attisdropped = false, attislocal = true, attinhcount = 0, attcollation = 100}
(gdb) print val
$5 = 0
(gdb) print values[7]
$6 = 94728219299776
(gdb)

The whole cluster was pg_upgraded from pg12 to pg17 with two databases (postgres and xtimeseries). I tried it again. I created test tables with unique brin index name and only xtimeseries database has problem - sigsegv.
[cid:2f0b5f63-6992-47ff-b79e-6fcb29a8cd2e]

Do you have any other idea what may cause this problem?
Thank you,

Best regards, Lubo

________________________________
From: Tomas Vondra <tomas(at)vondra(dot)me>
Sent: Monday, 11 November 2024 15:40
To: Ľuboslav Špilák <lspilak(at)microstep-hdo(dot)sk>; Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Segmentation fault - PostgreSQL 17.0

On 11/11/24 15:22, Ľuboslav Špilák wrote:
> Hello.
>
>> Could you maybe try on a completely new 17.0 cluster, not one that went
>>through pg_upgrade? I don't think pg_upgrade should cause anything like
>>this, but it'd be good to conclusively rule that out by reproducing the
>>issue on a fresh cluster.
>
> We can't reproduce the problem on a completely new 17.0 cluster.
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00005627752205d5 in heap_compute_data_size
> (tupleDesc=tupleDesc(at)entry=0x5627a1e1eea0,
> values=values(at)entry=0x7fff4744a450, isnull=isnull(at)entry=0x7fff4744a448)
> at ./build/../src/backend/access/common/heaptuple.c:234
> 234 ./build/../src/backend/access/common/heaptuple.c: No such file
> or directory.
> *(gdb) print i*
> *$1 = 6*
> *(gdb) print numberOfAttributes*
> *$2 = <optimized out>*
> *(gdb) print *tupleDesc*
> *$3 = {natts = 7, tdtypeid = 2249, tdtypmod = 0, tdrefcount = -1, constr
> = 0x0, attrs = 0x5627a1e1eeb8}*
> *(gdb) print *atti*
> *$4 = {attrelid = 0, attname = {data = "value", '\000' <repeats 58
> times>}, atttypid = 25, attlen = -1, attnum = 7, attcacheoff = -1,
> atttypmod = -1, attndims = 0, attbyval = false,*
> * attalign = 105 'i', attstorage = 120 'x', attcompression = 0 '\000',
> attnotnull = false, atthasdef = false, atthasmissing = false,
> attidentity = 0 '\000', attgenerated = 0 '\000',*
> * attisdropped = false, attislocal = true, attinhcount = 0,
> attcollation = 100}*

OK, this is really weird - the index you created clearly has just 1
attribute, but this descriptor says there are 7. Which means it likely
accesses garbage outside the actual BRIN tuple - not surprising it
crashes on that.

That tuple descriptor however looks sane, so my guess is you actually
have multiple indexes with the same relname, in a different schemas. And
this finds the wrong one first. That would also explain why it only
happens on an upgraded cluster - the new one won't have the other
indexes, of course.

What does

SELECT * FROM pg_class WHERE relname = 'test_idxbrin';

say? My bet is it'll return multiple rows, one of which will have 7
attributes.

If this is the case, it's not a bug - as Peter explained, there are some
basic sanity checks, but there's not enough info to check everything. If
you pass a page as bytea with a mismatching index, segfault is expected
(even if unfortunate). It's a power tool - if you hold it wrong, you may
get injured.

One solution is to use fully qualified name of the index, including the
schema. Or always set the search_path.

regards

--
Tomas Vondra

________________________________

Textom tejto emailovej správy odosielateľ nesľubuje ani neuzatvára za spoločnosť MicroStep - HDO s.r.o. žiadnu zmluvu, nakoľko naša spoločnosť uzatvára každú zmluvu výlučne v písomnej forme. Ak Vám bol tento e-mail zaslaný omylom, prosím upozornite odosielateľa a tento e-mail odstráňte.

The sender of this e-mail message does not promise nor shall conclude any contract on the behalf of the company MicroStep HDO s.r.o. as our company enters into any contract exclusively in writing. If you have been sent this email in error, please notify the sender and delete this email.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tomas Vondra 2024-11-11 16:22:13 Re: Segmentation fault - PostgreSQL 17.0
Previous Message Tomas Vondra 2024-11-11 14:40:05 Re: Segmentation fault - PostgreSQL 17.0