Re: BUG #17406: Segmentation fault on GiST index after 14.2 upgrade

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Victor Yegorov <vyegorov(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17406: Segmentation fault on GiST index after 14.2 upgrade
Date: 2022-02-17 00:16:47
Message-ID: 788a0a51-b1e3-3601-5783-3f6488796ec4@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2/16/22 09:14, Victor Yegorov wrote:
> вт, 15 февр. 2022 г. в 22:51, Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com <mailto:tomas(dot)vondra(at)enterprisedb(dot)com>>:
>
> Hmm. So I guess there are three options:
>
> 1) The index was already broken on 12.9, but for some reason (choice of
> a different plan, ...) it was not causing any issues.
>
>
> Nope, index is actively used on 12.9, plan hasn't changed.
>

OK, that's valuable information.

> How large is the table/index? Are you able to run the query with a
> custom build (without values optimized out)? Any chance you still
> have a
> backup from before the pg_upgrade, on which you might run the query?
>
>
> Yes, this is a test DB restored from backup in order to test out 14
> upgrade, production is still running on 12.9.
> v3_region is 2832 kB
> region_ltree_path_idx_gist is 472 kB
>

That means it should be possible to reproduce the issue elsewhere by
copying the files (and schema). Is there any sensitive data that'd
prevent handing over this data?

>
> (gdb) frame 7
> #7  gistScanPage (scan=scan(at)entry=0x55bbbd4196c8,
> pageItem=pageItem(at)entry=0x7ffdac2e3fe0,
> myDistances=myDistances(at)entry=0x0, tbm=tbm(at)entry=0x0,
> ntids=ntids(at)entry=0x0) at ./build/../src/backend/access/gist/gistget.c:438
> 438     ./build/../src/backend/access/gist/gistget.c: No such file or
> directory.
> (gdb) p pageItem->blkno
> $1 = 0
> (gdb) p pageItem->data->heap
> $5 = {heapPtr = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 0},
> recheck = false, recheckDistances = false, recontup = 0x55bbbb96927e
> <palloc+46>, offnum = 38600}
>
> This query below also crashes with SegFault:
> SELECT * FROM gist_page_items(get_raw_page('region_ltree_path_idx_gist',
> 0), 'region_ltree_path_idx_gist');
>

Interesting! What's the backtrace from the crash?

FWIW when I try that query on the gist index from the ltree example in
our documentation, I get this:

test=# SELECT * FROM gist_page_items(get_raw_page('path_gist_idx', 0),
'path_gist_idx');
WARNING: problem in alloc set ExprContext: detected write past chunk
end in block 0x21b4ae0, chunk 0x21b5ff8
WARNING: problem in alloc set ExprContext: detected write past chunk
end in block 0x21b4ae0, chunk 0x21b6f20
WARNING: problem in alloc set ExprContext: detected write past chunk
end in block 0x21b4ae0, chunk 0x21b7e48
WARNING: problem in alloc set ExprContext: detected write past chunk
end in block 0x21b4ae0, chunk 0x21b8db0
WARNING: problem in alloc set ExprContext: detected write past chunk
end in block 0x21b4ae0, chunk 0x21b9d18
WARNING: problem in alloc set ExprContext: detected write past chunk
end in block 0x21b4ae0, chunk 0x21bac80
WARNING: problem in alloc set ExprContext: detected write past chunk
end in block 0x21b0ad0, chunk 0x21b1360
WARNING: problem in alloc set ExprContext: detected write past chunk
end in block 0x21b0ad0, chunk 0x21b2288
WARNING: problem in alloc set ExprContext: detected write past chunk
end in block 0x21b0ad0, chunk 0x21b31f0
WARNING: problem in alloc set ExprContext: detected write past chunk
end in block 0x21b0ad0, chunk 0x21b4158
WARNING: problem in alloc set ExprContext: detected write past chunk
end in block 0x21aaab0, chunk 0x21adb20
WARNING: problem in alloc set ExprContext: detected write past chunk
end in block 0x21aaab0, chunk 0x21ae5c8
itemoffset | ctid | itemlen | dead | keys
------------+--------+---------+------+-----------
1 | (0,1) | 32 | f | (path)=()
2 | (0,2) | 48 | f | (path)=()
3 | (0,3) | 64 | f | (path)=()
4 | (0,4) | 80 | f | (path)=()
5 | (0,5) | 80 | f | (path)=()
6 | (0,6) | 48 | f | (path)=()
7 | (0,7) | 72 | f | (path)=()
8 | (0,8) | 48 | f | (path)=()
9 | (0,9) | 64 | f | (path)=()
10 | (0,10) | 80 | f | (path)=()
11 | (0,11) | 88 | f | (path)=()
12 | (0,12) | 96 | f | (path)=()
13 | (0,13) | 96 | f | (path)=()
(13 rows)

This is on debug build with asserts. On non-assert build it crashes in
AllocSetAlloc, but I'd bet the exact place where it crashes just depends
on what place we corrupt by writing out of ExprContext. The empty paths
seem strange too, of course.

But the confusing thing is I get similar issues even on 12.10 (after
backporting the pageinspect gist changes) - but only with asserts, and
without asserts it works (but the paths are still empty). Perhaps I
backported that incorrectly, though.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2022-02-17 02:52:40 Re: BUG #17406: Segmentation fault on GiST index after 14.2 upgrade
Previous Message Tom Lane 2022-02-16 17:53:47 Re: Report a potential memory leak in setup_config()