From: | Jeremy Schneider <schnjere(at)amazon(dot)com> |
---|---|
To: | "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | logical decoding bug: segfault in ReorderBufferToastReplace() |
Date: | 2019-12-05 01:36:16 |
Message-ID: | b1da9abc-bad7-efd6-2f1c-503d508c3877@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-committers pgsql-hackers |
On 9/8/19 14:01, Tom Lane wrote:
> Fix RelationIdGetRelation calls that weren't bothering with error checks.
>
> ...
>
> Details
> -------
> https://git.postgresql.org/pg/commitdiff/69f883fef14a3fc5849126799278abcc43f40f56
We had two different databases this week (with the same schema) both
independently hit the condition of this recent commit from Tom. It's on
11.5 so we're actually segfaulting and restarting rather than just
causing the walsender process to ERROR, but regardless there's still
some underlying bug here.
We have core files and we're still working to see if we can figure out
what's going on, but I thought I'd report now in case anyone has extra
ideas or suggestions. The segfault is on line 3034 of reorderbuffer.c.
3033 toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid);
3034 toast_desc = RelationGetDescr(toast_rel);
We'll keep looking; let me know any feedback! Would love to track down
whatever bug is in the logical decoding code, if that's what it is.
==========
backtrace showing the call stack...
Core was generated by `postgres: walsender <NAME-REDACTED>
<DNS-REDACTED>(31712)'.
Program terminated with signal 11, Segmentation fault.
#0 ReorderBufferToastReplace (rb=0x3086af0, txn=0x3094a78,
relation=0x2b79177249c8, relation=0x2b79177249c8, change=0x30ac938)
at reorderbuffer.c:3034
3034 reorderbuffer.c: No such file or directory.
...
(gdb) #0 ReorderBufferToastReplace (rb=0x3086af0, txn=0x3094a78,
relation=0x2b79177249c8, relation=0x2b79177249c8, change=0x30ac938)
at reorderbuffer.c:3034
#1 ReorderBufferCommit (rb=0x3086af0, xid=xid(at)entry=1358809,
commit_lsn=9430473346032, end_lsn=<optimized out>,
commit_time=commit_time(at)entry=628712466364268,
origin_id=origin_id(at)entry=0, origin_lsn=origin_lsn(at)entry=0) at
reorderbuffer.c:1584
#2 0x0000000000716248 in DecodeCommit (xid=1358809,
parsed=0x7ffc4ce123f0, buf=0x7ffc4ce125b0, ctx=0x3068f70) at decode.c:637
#3 DecodeXactOp (ctx=0x3068f70, buf=buf(at)entry=0x7ffc4ce125b0) at
decode.c:245
#4 0x000000000071655a in LogicalDecodingProcessRecord (ctx=0x3068f70,
record=0x3069208) at decode.c:117
#5 0x0000000000727150 in XLogSendLogical () at walsender.c:2886
#6 0x0000000000729192 in WalSndLoop (send_data=send_data(at)entry=0x7270f0
<XLogSendLogical>) at walsender.c:2249
#7 0x0000000000729f91 in StartLogicalReplication (cmd=0x30485a0) at
walsender.c:1111
#8 exec_replication_command (
cmd_string=cmd_string(at)entry=0x2f968b0 "START_REPLICATION SLOT
\"<NAME-REDACTED>\" LOGICAL 893/38002B98 (proto_version '1',
publication_names '\"<NAME-REDACTED>\"')") at walsender.c:1628
#9 0x000000000076e939 in PostgresMain (argc=<optimized out>,
argv=argv(at)entry=0x2fea168, dbname=0x2fea020 "<NAME-REDACTED>",
username=<optimized out>) at postgres.c:4182
#10 0x00000000004bdcb5 in BackendRun (port=0x2fdec50) at postmaster.c:4410
#11 BackendStartup (port=0x2fdec50) at postmaster.c:4082
#12 ServerLoop () at postmaster.c:1759
#13 0x00000000007062f9 in PostmasterMain (argc=argc(at)entry=7,
argv=argv(at)entry=0x2f92540) at postmaster.c:1432
#14 0x00000000004be73b in main (argc=7, argv=0x2f92540) at main.c:228
==========
Some additional context...
# select * from pg_publication_rel;
prpubid | prrelid
---------+---------
71417 | 16453
71417 | 54949
(2 rows)
(gdb) print toast_rel
$4 = (struct RelationData *) 0x0
(gdb) print *relation->rd_rel
$11 = {relname = {data = "<NAME-REDACTED>", '\000' <repeats 44 times>},
relnamespace = 16402, reltype = 16430, reloftype = 0,
relowner = 16393, relam = 0, relfilenode = 16428, reltablespace = 0,
relpages = 0, reltuples = 0, relallvisible = 0, reltoastrelid = 0,
relhasindex = true, relisshared = false, relpersistence = 112 'p',
relkind = 114 'r', relnatts = 4, relchecks = 0, relhasoids = false,
relhasrules = false, relhastriggers = false, relhassubclass = false,
relrowsecurity = false, relforcerowsecurity = false,
relispopulated = true, relreplident = 100 'd', relispartition = false,
relrewrite = 0, relfrozenxid = 1808, relminmxid = 1}
(gdb) print *relation
$12 = {rd_node = {spcNode = 1663, dbNode = 16401, relNode = 16428},
rd_smgr = 0x0, rd_refcnt = 1, rd_backend = -1, rd_islocaltemp = false,
rd_isnailed = false, rd_isvalid = true, rd_indexvalid = 0 '\000',
rd_statvalid = false, rd_createSubid = 0, rd_newRelfilenodeSubid = 0,
rd_rel = 0x2b7917724bd8, rd_att = 0x2b7917724ce8, rd_id = 16428,
rd_lockInfo = {lockRelId = {relId = 16428, dbId = 16401}},
rd_rules = 0x0, rd_rulescxt = 0x0, trigdesc = 0x0, rd_rsdesc = 0x0,
rd_fkeylist = 0x0, rd_fkeyvalid = false, rd_partkeycxt = 0x0,
rd_partkey = 0x0, rd_pdcxt = 0x0, rd_partdesc = 0x0, rd_partcheck = 0x0,
rd_indexlist = 0x0, rd_oidindex = 0, rd_pkindex = 0,
rd_replidindex = 0, rd_statlist = 0x0, rd_indexattr = 0x0,
rd_projindexattr = 0x0, rd_keyattr = 0x0, rd_pkattr = 0x0, rd_idattr = 0x0,
rd_projidx = 0x0, rd_pubactions = 0x0, rd_options = 0x0, rd_index = 0x0,
rd_indextuple = 0x0, rd_amhandler = 0, rd_indexcxt = 0x0,
rd_amroutine = 0x0, rd_opfamily = 0x0, rd_opcintype = 0x0, rd_support =
0x0, rd_supportinfo = 0x0, rd_indoption = 0x0, rd_indexprs = 0x0,
rd_indpred = 0x0, rd_exclops = 0x0, rd_exclprocs = 0x0, rd_exclstrats =
0x0, rd_amcache = 0x0, rd_indcollation = 0x0,
rd_fdwroutine = 0x0, rd_toastoid = 0, pgstat_info = 0x0,
rd_partcheckvalid = false, rd_partcheckcxt = 0x0}
(gdb) print *desc
$13 = {natts = 4, tdtypeid = 16430, tdtypmod = -1, tdhasoid = false,
tdrefcount = 1, constr = 0x2b7917724ef8, attrs = 0x2b7917724d08}
(gdb) print *txn
$2 = {xid = 1358809, has_catalog_changes = true, is_known_as_subxact =
false, toplevel_xid = 0, first_lsn = 9430473113640,
final_lsn = 9430473346032, end_lsn = 9430473350592,
restart_decoding_lsn = 0, origin_id = 0, origin_lsn = 0,
commit_time = 628712466364268, base_snapshot = 0x308cdc0,
base_snapshot_lsn = 9430473113776, base_snapshot_node = {prev = 0x3086b08,
next = 0x3086b08}, nentries = 357, nentries_mem = 357, serialized =
false, changes = {head = {prev = 0x30aca08, next = 0x309aac8}},
tuplecids = {head = {prev = 0x30ac878, next = 0x309ab18}}, ntuplecids
= 151, tuplecid_hash = 0x30b0bf0, toast_hash = 0x30bb460,
subtxns = {head = {prev = 0x3094b30, next = 0x3094b30}}, nsubtxns = 0,
ninvalidations = 278, invalidations = 0x30acb08, node = {
prev = 0x3086af8, next = 0x3086af8}}
(gdb) print *change
$1 = {lsn = 9430473343416, action = REORDER_BUFFER_CHANGE_INSERT,
origin_id = 0, data = {tp = {relnode = {spcNode = 1663, dbNode = 16401,
relNode = 16428}, clear_toast_afterwards = true, oldtuple = 0x0,
newtuple = 0x2b79313f9c68}, truncate = {nrelids = 70441758623359,
cascade = 44, restart_seqs = 64, relids = 0x0}, msg = {prefix =
0x40110000067f <Address 0x40110000067f out of bounds>,
message_size = 4294983724, message = 0x0}, snapshot =
0x40110000067f, command_id = 1663, tuplecid = {node = {spcNode = 1663,
dbNode = 16401, relNode = 16428}, tid = {ip_blkid = {bi_hi = 1,
bi_lo = 0}, ip_posid = 0}, cmin = 0, cmax = 826252392,
combocid = 11129}}, node = {prev = 0x30ac918, next = 0x30ac9b8}}
--
Jeremy Schneider
Database Engineer
Amazon Web Services
From | Date | Subject | |
---|---|---|---|
Next Message | Petr Fedorov | 2019-12-05 07:24:16 | A row-level trigger on a partitioned table is not created on a sub-partition created later |
Previous Message | Tom Lane | 2019-12-05 01:30:08 | Re: BUG #16140: View with INSERT, DO INSTEAD, and ON CONFLICT causes an error |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2019-12-05 13:38:36 | Re: logical decoding bug: segfault in ReorderBufferToastReplace() |
Previous Message | Michael Paquier | 2019-12-05 00:41:15 | Re: pgsql: Blind attempt at fixing ecpg/compatlib's build |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2019-12-05 01:48:23 | Re: Update minimum SSL version |
Previous Message | Kyotaro Horiguchi | 2019-12-05 01:28:03 | Re: could not stat promote trigger file leads to shutdown |