From: | bhargav kamineni <bhargavpostgres(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: PMChildFlags array |
Date: | 2019-10-05 17:26:27 |
Message-ID: | CADCf-WPf2C-c6UNUNK_U-OrFnx-fhcET9XOC58mn31CkQ_QTjw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Thanks Tom Lane for detailing the issue.
>So ... how many is "a bunch"?
more than 85
>Looking at the code, it seems like it'd be possible for a sufficiently
>aggressive spawner of incoming connections to reach the
>MaxLivePostmasterChildren limit. While the postmaster would correctly
>reject additional connection attempts after that, what it would not do
>is ensure that any child slots are left for new parallel worker processes.
>So we could hypothesize that the error you're seeing in the log is from
>failure to spawn a parallel worker process, due to being out of child
>slots.
Thanks Tom Lane for detailing the issue.
we have enabled "max_parallel_workers_per_gather = 4". 20 days before we
ran into this issue .
>However, given that max_connections = 500, MaxLivePostmasterChildren()
>would be 1000-plus. This would mean that reaching this condition would
>require *at least* 500 concurrent connection-attempts-that-haven't-yet-
>been-rejected, maybe well more than that if you didn't have close to
>500 legitimately open sessions. That seems like a lot, enough to suggest
>that you've got some pretty serious bug in your client-side logic.
below errors observed after crash in postgres logfile :
ERROR: xlog flush request is not satisfied for couple of tables , we have
initiated the vacuum full on those tables and the error went off after that.
ERROR: right sibling's left-link doesn't match: block 273660 links to
273500 instead of expected 273661 in index -- observed this error while
doing vacuum freeze on databsase , we have dropped this index and created a
new one
Observations :
Vacuum freeze analyze job is getting stuck at database end which is
initiated thru cronjob, pg_cancel_backend(), pg_termiante_backend() is not
able to terminate those stuck process , Restarting the database only able
to clear those process , i am thinking this is happening due to corruption
(if this is true how can i detect this ? pg_dump ?). is there any way to
overcome this problem ?
does migrating the database to a new instance (pg_basebackup and switching
over to new instance ) solves this issue ?
Anyway, I think it's clearly a bug that canAcceptConnections() thinks the
number of acceptable connections is identical to the number of allowed
child processes; it needs to be less, by the number of background
processes we want to support. But it seems like a darn hard-to-hit bug,
so I'm not quite sure that that explains your observation.
On Fri, 4 Oct 2019 at 03:49, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> bhargav kamineni <bhargavpostgres(at)gmail(dot)com> writes:
> >> What was the database doing just before the FATAL line?
>
> > Postgres was rejecting a bunch of connections from a user who is having a
> > connection limit set. that was the the FATAL error that i could see in
> log
> > file.
> > FATAL,53300,"too many connections for role ""user_app"""
>
> So ... how many is "a bunch"?
>
> Looking at the code, it seems like it'd be possible for a sufficiently
> aggressive spawner of incoming connections to reach the
> MaxLivePostmasterChildren limit. While the postmaster would correctly
> reject additional connection attempts after that, what it would not do
> is ensure that any child slots are left for new parallel worker processes.
> So we could hypothesize that the error you're seeing in the log is from
> failure to spawn a parallel worker process, due to being out of child
> slots.
>
> However, given that max_connections = 500, MaxLivePostmasterChildren()
> would be 1000-plus. This would mean that reaching this condition would
> require *at least* 500 concurrent connection-attempts-that-haven't-yet-
> been-rejected, maybe well more than that if you didn't have close to
> 500 legitimately open sessions. That seems like a lot, enough to suggest
> that you've got some pretty serious bug in your client-side logic.
>
> Anyway, I think it's clearly a bug that canAcceptConnections() thinks the
> number of acceptable connections is identical to the number of allowed
> child processes; it needs to be less, by the number of background
> processes we want to support. But it seems like a darn hard-to-hit bug,
> so I'm not quite sure that that explains your observation.
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | PegoraroF10 | 2019-10-05 17:37:10 | Re: Performance on JSONB select |
Previous Message | Fabrízio de Royes Mello | 2019-10-05 15:00:05 | Re: Performance on JSONB select |