From: | Bernd Helmle <mailings(at)oopsware(dot)de> |
---|---|
To: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Deadlock in XLogInsert at AIX |
Date: | 2017-01-30 14:26:20 |
Message-ID: | 1485786380.3084.2.camel@oopsware.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Konstantin,
We had observed exactly the same issues on a customer system with the
same environment and PostgreSQL 9.5.5. Additionally, we've tested on
Linux with XL/C 12 and 13 with exactly the same deadlock behavior.
So we assumed that this is somehow a compiler issue.
Am Dienstag, den 24.01.2017, 19:26 +0300 schrieb Konstantin Knizhnik:
> More information about the problem - Postgres log contains several
> records:
>
> 2017-01-24 19:15:20.272 MSK [19270462] LOG: request to flush past
> end
> of generated WAL; request 6/AAEBE000, currpos 6/AAEBC2B0
>
> and them correspond to the time when deadlock happen.
Yeah, the same logs here:
LOG: request to flush past end of generated WAL; request 1/1F4C6000,
currpos 1/1F4C40E0
STATEMENT: UPDATE pgbench_accounts SET abalance = abalance + -2653
WHERE aid = 3662494;
> There is the following comment in xlog.c concerning this message:
>
> /*
> * No-one should request to flush a piece of WAL that hasn't
> even been
> * reserved yet. However, it can happen if there is a block with
> a
> bogus
> * LSN on disk, for example. XLogFlush checks for that situation
> and
> * complains, but only after the flush. Here we just assume that
> to
> mean
> * that all WAL that has been reserved needs to be finished. In
> this
> * corner-case, the return value can be smaller than 'upto'
> argument.
> */
>
> So looks like it should not happen.
> The first thing to suspect is spinlock implementation which is
> different
> for GCC and XLC.
> But ... if I rebuild Postgres without spinlocks, then the problem is
> still reproduced.
Before we got the results from XLC on Linux (where Postgres show the
same behavior) i had a look into the spinlock implementation. If i got
it right, XLC doesn't use the ppc64 specific ones, but the fallback
implementation (system monitoring on AIX also has shown massive calls
for signal(0)...). So i tried the following patch:
diff --git a/src/include/port/atomics/arch-ppc.h
b/src/include/port/atomics/arch-ppc.h
new file mode 100644
index f901a0c..028cced
*** a/src/include/port/atomics/arch-ppc.h
--- b/src/include/port/atomics/arch-ppc.h
***************
*** 23,26 ****
--- 23,33 ----
#define pg_memory_barrier_impl() __asm__ __volatile__ ("sync" :
: :
"memory")
#define pg_read_barrier_impl() __asm__ __volatile__
("lwsync" : : : "memory")
#define pg_write_barrier_impl() __asm__ __volatile__
("lwsync" : : : "memory")
+
+ #elif defined(__IBMC__) || defined(__IBMCPP__)
+
+ #define pg_memory_barrier_impl() __asm__ __volatile__ (" sync
\n"
::: "memory")
+ #define pg_read_barrier_impl() __asm__ __volatile__ ("
lwsync \n" ::: "memory")
+ #define pg_write_barrier_impl() __asm__ __volatile__ ("
lwsync \n" ::: "memory")
+
#endif
This didn't change the picture, though.
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2017-01-30 14:40:39 | Re: One-shot expanded output in psql using \G |
Previous Message | Simon Riggs | 2017-01-30 14:04:08 | Re: Superowners |