From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
---|---|
To: | Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru> |
Cc: | PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: OOM in libpq and infinite loop with getCopyStart() |
Date: | 2016-03-04 05:16:40 |
Message-ID: | CAB7nPqTfCEMObGVPbPmNBvq3mmu+BeV6j-6swN4m_dC3mmFVbg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Mar 4, 2016 at 12:59 AM, Aleksander Alekseev
<a(dot)alekseev(at)postgrespro(dot)ru> wrote:
>> The easiest way to perform tests with this patch is to take a debugger
>> and enforce the malloc'd pointers to NULL in the code paths.
>
> I see. Still I don't think it's an excuse to not provide clear steps to
> reproduce an issue. As I see it anyone should be able to easily check
> your patch locally without having deep understanding of how libpq is
> implemented or reading thread which contains 48 e-mails.
OK, here they are if that helps. Following those steps and having some
knowledge of lldb or gdb is enough. The key point is that getCopyStart
is the routine to breakpoint and make fail.
A) psql and COPY.
1) gdb --args psql -c 'COPY (SELECT 1) TO stdout'
2) Then take a breakpoint at getCopyStart()
3) run
4) Enforce the return result of PQmakeEmptyPGresult to NULL:
p result = 0
5) Enjoy the infinite loop with HEAD, and the error with the patch
B) pg_receivexlog:
1) mkdir data
gdb --args pg_receivexlog --verbose -D data/
2) Take a breakpoint at getCopyStart
3) run
4) enforce result = 0 after the call to PQmakeEmptyPGresult
5) And enjoy what has been reported
C) pg_recvlogical, similar to B.
1) Create a replication slot on a server that accepts them
select pg_create_logical_replication_slot('foo', 'test_decoding');
2) Same breakpoint as B) with this start command or similar (depends
on where the logical slot has been created):
pg_recvlogical --start --slot foo -f - -d postgres
D) triggering standby problem:
1) Patch upstream, as follows by adding a sleep in
libpqrcv_startstreaming of libpqwalreceiver.c. This is a simple method
to have the time to take a breakpoint on the WAL receiver process that
has been started, with streaming that has not begun yet.
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index f670957..c7fccf1 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -188,6 +188,9 @@ libpqrcv_startstreaming(TimeLineID tli, XLogRecPtr
startpoint, char *slotname)
char cmd[256];
PGresult *res;
+ /* ZZZzzz... */
+ pg_usleep(10000L);
+
/* Start streaming from the point requested by startup process */
if (slotname != NULL)
snprintf(cmd, sizeof(cmd),
Trying to design a test taking into account the context of a WAL
receiver does not seem worth it to me with a simple method like this
one...
2) Start the standby and attach debugger on the WAL receiver process,
send SIGSTOP to it, whatever...
3) breakpoint at getCopyStart
4) Then move on with the same process as previously, and enjoy the errors.
--
Michael
Attachment | Content-Type | Size |
---|---|---|
sleep-wal-receiver.patch | application/x-patch | 598 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-03-04 05:46:04 | Re: postgres_fdw vs. force_parallel_mode on ppc |
Previous Message | Dilip Kumar | 2016-03-04 05:06:31 | Re: Relation extension scalability |