From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Subject: | Re: SIGSEGV in BRIN autosummarize |
Date: | 2017-10-15 01:56:56 |
Message-ID: | 20171015015656.GC22678@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote:
> > Also notice the vacuum process was interrupted, same as yesterday (think
> > goodness for full logs). Our INSERT script is using python
> > multiprocessing.pool() with "maxtasksperchild=1", which I think means we load
> > one file and then exit the subprocess, and pool() creates a new subproc, which
> > starts a new PG session and transaction. Which explains why autovacuum starts
> > processing the table only to be immediately interrupted.
On Sun, Oct 15, 2017 at 01:57:14AM +0200, Tomas Vondra wrote:
> I don't follow. Why does it explain that autovacuum gets canceled? I
> mean, merely opening a new connection/session should not cancel
> autovacuum. That requires a command that requires table-level lock
> conflicting with autovacuum (so e.g. explicit LOCK command, DDL, ...).
I was thinking that INSERT would do it, but I gather you're right about
autovacuum. Let me get back to you about this..
> > Due to a .."behavioral deficiency" in the loader for those tables, the crashed
> > backend causes the loader to get stuck, so the tables should be untouched since
> > the crash, should it be desirable to inspect them.
> >
>
> It's a bit difficult to guess what went wrong from this backtrace. For
> me gdb typically prints a bunch of lines immediately before the frames,
> explaining what went wrong - not sure why it's missing here.
Do you mean this ?
...
Loaded symbols for /lib64/libnss_files-2.12.so
Core was generated by `postgres: autovacuum worker process gtt '.
Program terminated with signal 11, Segmentation fault.
#0 pfree (pointer=0x298c740) at mcxt.c:954
954 (*context->methods->free_p) (context, pointer);
> Perhaps some of those pointers are bogus, the memory was already pfree-d
> or something like that. You'll have to poke around and try dereferencing
> the pointers to find what works and what does not.
>
> For example what do these gdb commands do in the #0 frame?
>
> (gdb) p *(MemoryContext)context
(gdb) p *(MemoryContext)context
Cannot access memory at address 0x7474617261763a20
> (gdb) p *GetMemoryChunkContext(pointer)
(gdb) p *GetMemoryChunkContext(pointer)
No symbol "GetMemoryChunkContext" in current context.
I had to do this since it's apparently inlined/macro:
(gdb) p *(MemoryContext *) (((char *) pointer) - sizeof(void *))
$8 = (MemoryContext) 0x7474617261763a20
I uploaded the corefile:
http://telsasoft.com/tmp/coredump-postgres-autovacuum-brin-summarize.gz
Justin
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2017-10-15 10:06:11 | Re: proposal - Default namespaces for XPath expressions (PostgreSQL 11) |
Previous Message | Joe Conway | 2017-10-15 01:51:39 | Re: pg_regress help output |