Re: FW: query pg_stat_ssl hang 100%cpu

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: "James Pang (chaolpan)" <chaolpan(at)cisco(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: FW: query pg_stat_ssl hang 100%cpu
Date: 2023-09-08 02:48:28
Message-ID: ZPqLfAk8n/HrB6G4@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-performance

On Thu, Sep 7, 2023 at 10:39 PM James Pang (chaolpan)
<chaolpan(at)cisco(dot)com> wrote:
> (gdb) p RecordCacheArray
> $1 = (TupleDesc *) 0x7f5fac365d90
> (gdb) p RecordIdentifierArray
> $2 = (uint64 *) 0x0

Oh, yeah, this stack is broken. You have been really unlucky to hit
that. This can randomly cause any session to get stuck, and no need
for the extended query protocol here.

(I am not sure how, but my email server has somewhat not been able to
get the previous messages from James. Anyway.)

On Fri, Sep 08, 2023 at 11:45:51AM +1200, Thomas Munro wrote:
> I think the lazy fix would be to re-order those allocations. A
> marginally more elegant fix would be to merge the arrays, as in the
> attached. Thoughts?

So, ensure_record_cache_typmod_slot_exists() would allocate the
initial RecordCacheArray and if it fails on the second one it keeps
RecordCacheArrayLen at 0. When coming back again to this code path,
the second part of the routine causes an infinite loop because the
allocation has been done, but the tracked length is 0. Fun.

This is broken since 4b93f57 where the second array has been
introduced. Getting that fixed before 11 is EOL is nice as it was
introduced there, so good timing.

There is a repalloc_extended(), but I cannot get excited to use
MCXT_ALLOC_NO_OOM in this code path if there is a simpler method to
avoid this issue with a single allocation for the all information set.

+static RecordCacheArrayEntry * RecordCacheArray = NULL;

pgindent is annoyed by that.. typedefs.list has been updated in your
patch, so I guess that you missed one extra indentation after this is
refreshed.

Note that RememberToFreeTupleDescAtEOX() does something similar to the
type cache, and it uses the same approach as your patch.

+1 to your proposal of using a struct for the entries in the cache.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2023-09-08 03:47:49 BUG #18097: Immutable expression not allowed in generated at
Previous Message PG Bug reporting form 2023-09-08 00:26:01 BUG #18096: In edge-triggered epoll and kqueue, PQconsumeInput/PQisBusy are insufficient for correct async ops.

Browse pgsql-performance by date

  From Date Subject
Next Message Mikhail Balayan 2023-09-08 10:51:16 Planning time is time-consuming
Previous Message Thomas Munro 2023-09-07 23:45:51 Re: FW: query pg_stat_ssl hang 100%cpu