Re: PostgreSQL client hangs sometimes on 'EXEC SQL PREPARE sid_sisisinst FROM :select_anw;'

From: Matthias Apitz <guru(at)unixarea(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: PostgreSQL client hangs sometimes on 'EXEC SQL PREPARE sid_sisisinst FROM :select_anw;'
Date: 2020-05-05 18:12:00
Message-ID: 20200505181200.GA3640@c720-r342378
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

El día lunes, abril 27, 2020 a las 09:40:39a. m. -0400, Tom Lane escribió:

> Matthias Apitz <guru(at)unixarea(dot)de> writes:
> > El día Montag, April 27, 2020 a las 08:40:04 -0400, Tom Lane escribió:
> >> Can you get a stack trace from the connected backend?
>
> > (gdb) bt
> > #0 0x00007fd567776000 in epoll_pwait () from /lib64/libc.so.6
> > #1 0x000000000084476c in WaitEventSetWaitBlock ()
> > #2 0x0000000000844647 in WaitEventSetWait ()
> > #3 0x00000000006f89d2 in secure_read ()
> > #4 0x0000000000707425 in pq_recvbuf ()
> > #5 0x0000000000707709 in pq_discardbytes ()
> > #6 0x0000000000707aba in pq_getmessage ()
> > #7 0x000000000086b478 in SocketBackend ()
> > #8 0x000000000086b4c4 in ReadCommand ()
> > #9 0x000000000086fda9 in PostgresMain ()
>
> Oh, that is *very* interesting, because there is only one caller of
> pq_discardbytes:
>
> /*
> * Allocate space for message. If we run out of room (ridiculously
> * large message), we will elog(ERROR), but we want to discard the
> * message body so as not to lose communication sync.
> */
> PG_TRY();
> {
> enlargeStringInfo(s, len);
> }
> PG_CATCH();
> {
> if (pq_discardbytes(len) == EOF)
> ereport(COMMERROR,
> (errcode(ERRCODE_PROTOCOL_VIOLATION),
> errmsg("incomplete message from client")));
> ...
>
> What this code intends to handle is the case where the client has sent a
> message that is so long that the backend hasn't enough memory to buffer
> it. What's actually happened, more likely, is that the received message
> length is corrupt and just appears to be large, since the client-side
> trace shows that libpq has sent what it has to send and is now waiting for
> a reply. If the received length were correct then the pq_discardbytes
> call would have completed after eating the message.
>
> So what it looks like is that something is corrupting data on its
> way from the client to the server. Flaky firewall maybe? If you're
> using SSL, maybe an SSL library bug? I'm reduced to speculation at
> this point. It's hard even to say what to do to gather more info.
> If you could reproduce it then I'd suggest watching the connection
> with wireshark or the like to see what data is actually going across
> the wire ... but IIUC it's pretty random, so that approach seems
> unlikely to work.
>
> If you're in a position to run a modified server, you could try
> inserting a debug log message:
>
> }
> PG_CATCH();
> {
> + elog(COMMERROR, "bogus received message length: %d", len);
> if (pq_discardbytes(len) == EOF)
> ereport(COMMERROR,
>
> (This is in src/backend/libpq/pqcomm.c, around line 1300 as of HEAD.)
> While this seems unlikely to teach us a huge amount, perhaps the
> value of the incorrect length would be informative.
>
> Are you always seeing this error at the exact same place so far as
> the client side is concerned? It's hard to see why a transport-level
> problem would preferentially affect one particular query ...

I've added the printout of the length in this case and another one, and
see this in the server's log file:

2020-05-04 10:05:49.977 CEST [32092] LOG: invalid length 33554940 of startup packet
2020-05-04 10:05:50.207 CEST [32094] LOG: invalid length 33554940 of startup packet
2020-05-04 12:32:50.781 CEST [20334] LOG: bogus received message length: 1650422894
2020-05-04 12:36:41.293 CEST [20380] LOG: bogus received message length: 1650422894
2020-05-04 12:39:39.461 CEST [20441] LOG: bogus received message length: 1650422894
2020-05-04 13:01:50.566 CEST [24222] LOG: bogus received message length: 1650422894

Any comments? Thanks

matthias

--
Matthias Apitz, ✉ guru(at)unixarea(dot)de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2020-05-05 19:11:00 Re: Lock Postgres account after X number of failed logins?
Previous Message Peter 2020-05-05 17:54:37 Re: 12.2: Howto check memory-leak in worker?