| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Ashwin Agrawal <aagrawal(at)pivotal(dot)io> |
| Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Race conditions with checkpointer and shutdown |
| Date: | 2019-04-29 17:35:59 |
| Message-ID: | 29929.1556559359@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Ashwin Agrawal <aagrawal(at)pivotal(dot)io> writes:
> For Greenplum (based on 9.4 but current master code looks the same) we
> did see deadlocks recently hit in CI many times for walreceiver which
> I believe confirms above finding.
> #0 __lll_lock_wait_private () at
> ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
> #1 0x00007f0637ee72bd in _int_free (av=0x7f063822bb20 <main_arena>,
> p=0x26bb3b0, have_lock=0) at malloc.c:3962
> #2 0x00007f0637eeb53c in __GI___libc_free (mem=<optimized out>) at
> malloc.c:2968
> #3 0x00007f0636629464 in ?? () from /usr/lib/x86_64-linux-gnu/libgnutls.so.30
> #4 0x00007f0636630720 in ?? () from /usr/lib/x86_64-linux-gnu/libgnutls.so.30
> #5 0x00007f063b5cede7 in _dl_fini () at dl-fini.c:235
> #6 0x00007f0637ea0ff8 in __run_exit_handlers (status=1,
> listp=0x7f063822b5f8 <__exit_funcs>,
> run_list_atexit=run_list_atexit(at)entry=true) at exit.c:82
> #7 0x00007f0637ea1045 in __GI_exit (status=<optimized out>) at exit.c:104
> #8 0x00000000008c72c7 in proc_exit ()
> #9 0x0000000000a75867 in errfinish ()
> #10 0x000000000089ea53 in ProcessWalRcvInterrupts ()
> #11 0x000000000089eac5 in WalRcvShutdownHandler ()
> #12 <signal handler called>
> #13 _int_malloc (av=av(at)entry=0x7f063822bb20 <main_arena>,
> bytes=bytes(at)entry=16384) at malloc.c:3802
> #14 0x00007f0637eeb184 in __GI___libc_malloc (bytes=16384) at malloc.c:2913
> #15 0x00000000007754c3 in makeEmptyPGconn ()
> #16 0x0000000000779686 in PQconnectStart ()
> #17 0x0000000000779b8b in PQconnectdb ()
> #18 0x00000000008aae52 in libpqrcv_connect ()
> #19 0x000000000089f735 in WalReceiverMain ()
> #20 0x00000000005c5eab in AuxiliaryProcessMain ()
> #21 0x00000000004cd5f1 in ServerLoop ()
> #22 0x000000000086fb18 in PostmasterMain ()
> #23 0x00000000004d2e28 in main ()
Cool --- that stack trace is *exactly* what you'd expect if this
were the problem. Thanks for sending it along!
Can you try applying a1a789eb5ac894b4ca4b7742f2dc2d9602116e46
to see if it fixes the problem for you?
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2019-04-29 17:43:57 | Re: CHAR vs NVARCHAR vs TEXT performance |
| Previous Message | Tom Lane | 2019-04-29 17:32:13 | Re: "long" type is not appropriate for counting tuples |