From: | Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
Subject: | Re: segmentation fault when cassert enabled |
Date: | 2019-11-05 16:29:18 |
Message-ID: | 20191105172918.3e32a446@firost |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 25 Oct 2019 12:28:38 -0400
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> writes:
> > When investigating for the bug reported in thread "logical replication -
> > negative bitmapset member not allowed", I found a way to seg fault
> > postgresql only when cassert is enabled.
> > ...
> > I hadn't time to digg further yet. However, I don't understand why this
> > crash is triggered when cassert is enabled.
>
> Most likely, it's not so much assertions that provoke the crash as
> CLOBBER_FREED_MEMORY, ie the actual problem here is use of already-freed
> memory.
Thank you. Indeed, enabling CLOBBER_FREED_MEMORY on its own is enough to
trigger the segfault.
In fact, valgrind detect it as an uninitialised value, no matter
CLOBBER_FREED_MEMORY is defined or not:
Conditional jump or move depends on uninitialised value(s)
at 0x43F410: slot_modify_cstrings (worker.c:398)
by 0x43FBE9: apply_handle_update (worker.c:744)
by 0x440088: apply_dispatch (worker.c:968)
by 0x4405D7: LogicalRepApplyLoop (worker.c:1175)
by 0x440CD0: ApplyWorkerMain (worker.c:1733)
by 0x411C34: StartBackgroundWorker (bgworker.c:834)
by 0x41EA24: do_start_bgworker (postmaster.c:5763)
by 0x41EB6F: maybe_start_bgworkers (postmaster.c:5976)
by 0x41F562: sigusr1_handler (postmaster.c:5161)
by 0x48A072F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so)
by 0x4B31FF6: select (select.c:41)
by 0x41FDDE: ServerLoop (postmaster.c:1668)
Uninitialised value was created by a heap allocation
at 0x5C579B: palloc (mcxt.c:949)
by 0x437116: logicalrep_rel_open (relation.c:270)
by 0x43FA8F: apply_handle_update (worker.c:684)
by 0x440088: apply_dispatch (worker.c:968)
by 0x4405D7: LogicalRepApplyLoop (worker.c:1175)
by 0x440CD0: ApplyWorkerMain (worker.c:1733)
by 0x411C34: StartBackgroundWorker (bgworker.c:834)
by 0x41EA24: do_start_bgworker (postmaster.c:5763)
by 0x41EB6F: maybe_start_bgworkers (postmaster.c:5976)
by 0x41F562: sigusr1_handler (postmaster.c:5161)
by 0x48A072F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so)
by 0x4B31FF6: select (select.c:41)
My best bet so far is that logicalrep_relmap_invalidate_cb is not called after
the DDL on the subscriber so the relmap cache is not invalidated. So we end up
with slot->tts_tupleDescriptor->natts superior than rel->remoterel->natts in
slot_store_cstrings, leading to the overflow on attrmap and the sigsev.
I hadn't follow this path yet.
By the way, I noticed attrmap is declared as AttrNumber * in struct
LogicalRepRelMapEntry, AttrNumber being typedef'd as an int16. However, attrmap
is allocated based on sizeof(int) in logicalrep_rel_open:
entry->attrmap = palloc(desc->natts * sizeof(int));
It doesn't look like a major problem, it just allocates more memory than
needed.
Regards,
From | Date | Subject | |
---|---|---|---|
Next Message | rtorre | 2019-11-05 16:49:27 | Re: [Proposal] Arbitrary queries in postgres_fdw |
Previous Message | Fujii Masao | 2019-11-05 15:56:51 | Re: pgbench - extend initialization phase control |