From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: PQexec() hangs on OOM |
Date: | 2015-09-19 05:14:13 |
Message-ID: | CAB7nPqT6gKj6iS9VTPth_h6Sz5Jo-177s6QJN_jrW66wyCjJ=w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Fri, Sep 18, 2015 at 11:32 PM, Amit Kapila wrote:
> IIRC, this is required to sanely report "out of memory" error in case
> of replication protocol (master-standby setup). This loop and
in-particular
> this check is quite similar to PQexecFinish() functions check and loop
> where we return last result. I think it is better to keep both the places
> in-sync
> and also I think this is required to report the error appropriately. I
have
> tried manual debugging for the out of memory error for this case and
> it works well with this check and without the check it doesn't report
> the error in an appropriate way(don't remember exactly what was
> the difference). If required, I can again try to reproduce the scenario
> and share the exact report.
I just had a look at that again... Put for example a call to pg_usleep in
libpqrcv_identify_system after executing IDENTIFY_SYSTEM and before calling
PQresultStatus, then take a breakpoint on the WAL receiver process when
starting up a standby. This gives plenty of room to emulate the OOM failure
in getCopyStart. When the check on PGRES_FATAL_ERROR is not added and when
emulating the OOM immediately, libpqrcv_PQexec loops twice and thinks it
can start up strrep but fails afterwards. Here is the failure seen from the
standby:
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
FATAL: could not send data to WAL stream: server closed the connection
unexpectedly
And from the master:
LOG: unexpected EOF on standby connection
The WAL receiver process ultimately restarts after.
When the check on PGRES_FATAL_ERROR is added, strrep fails to start
appropriately and the error is fetched correctly by the WAL receiver:
FATAL: could not start WAL streaming: out of memory
In short: Amit seems right to have added this check.
Note for later: looking at patches during conferences is really a bad habit.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | rysiek | 2015-09-20 02:37:54 | BUG #13625: LDAP connections via UNIX sockets |
Previous Message | Amit Kapila | 2015-09-19 04:32:52 | Re: PQexec() hangs on OOM |