Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Adam Sjøgren <asjo(at)koldfront(dot)dk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)
Date: 2020-09-30 20:29:43
Message-ID: f85131fd-6d96-19de-196b-c1e262cd6aa6@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 9/30/20 1:22 PM, Adam Sjøgren wrote:
> Tom writes:
>
>> =?utf-8?Q?Adam_Sj=C3=B8gren?= <asjo(at)koldfront(dot)dk> writes:
>>> Tom writes:
>>>> ... which implies that the problem is unexpectedly high contention for the
>>>> ProcArrayLock.
>>
>>> One thing I should have mentioned, but forgot, is that the database is
>>> configured to do logical replication to another machine - could that
>>> have an effect on the lock contention?
>>> A colleague pointed out that in the pg_locks output, the replication
>>> processes are waiting on the ProcArray lock:
>>> · https://koldfront.dk/misc/postgresql/pg_stat_activity_pg_locks-BAD.txt.gz
>>
>> Yeah, that is *mighty* interesting. For the archives' sake, I'll point
>> out that this shows 16 walsender processes, of which 6 are in
>> WalSenderWaitForWAL waits (ie, basically idle) and all of the other 10
>> are waiting for the ProcArrayLock.
>
> Sorry for not returning on this before now.
>
> I have seen the slow login problem - ProcArrayLock contention - happen
> with replication turned off, so I think that can be ruled out as the
> cause.
>
> I just observed something interesting right now, however.
>
> Just to recap the scene: the basic setup is that we have a table with a
> queue of jobs to be processed, and we have a number of programs (say 10)
> on 6 other machines grabbing those jobs, doing calculations and
> reporting back.

I don't have an answer. Not even sure if this is relevant to the
problem, but how are the jobs getting into the queue?

>
> The number of jobs at any given time can fluctuate from none to
> millions. Typically millions of jobs take some days to a week to get
> through.
>
> Now, I happened to have a psql prompt open when the problem of new
> logins being slow appeared - and I ran a query that counted the number
> of jobs by job-type.
>
> Before the slow login problem, that query took around 70-80 ms, but when
> the slow login appeared, it took 20+ seconds to run the query.
>
> EXPLAIN ANALYZE showed that it was running a parallel query.
>
> So I tried turning parallel queries off - with SET
> max_parallel_workers_per_gather = 0; - and now the query took around 150
> ms consistenly, despite logins still being (variably) slow.
>
> So that confirms the ProcArrayLock contention, I think.
>
> My next thought is: I have 6*10 programs making connections and doing
> parallel queries over the job queue, to get new jobs. If the jobs happen
> to be very "easy", those programs will be doing a lot of queries for
> jobs. And when the queue is large enough (I'm guessing), those queries
> will be parallel, so they will be creating new processes that need to be
> added and removed from the ProcArray.
>
> So am I shooting myself in the foot here, by creating lock contention
> when trying to get jobs quickly from a largeish table?
>
> I think my next step will be to modify the programs that grab the jobs
> to avoid parallel queries when looking up the next job to run.
>
> You anticipated this, it just took me until now to get a little closer:
>
>> It's also evident that there's some parallel query activity going on,
>> as some sessions are in BgWorkerShutdown waits; so parallel worker
>> entry/exit might be contributing to the overall load on ProcArrayLock.
>
>
> Best regards,
>
> Adam
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adam Sjøgren 2020-09-30 21:30:04 Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)
Previous Message Adam Sjøgren 2020-09-30 20:22:01 Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)