Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Jim Jarvie <jim(at)talentstack(dot)to>
Cc: Michael Lewis <mlewis(at)entrata(dot)com>, Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED
Date: 2020-08-24 17:58:15
Message-ID: CAMkU=1xQOCaedLg-tyrFWRL4THSEmq76zLs7=gtpChogXVmBLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Aug 18, 2020 at 8:22 PM Jim Jarvie <jim(at)talentstack(dot)to> wrote:

> I've tuned the LIMIT value both up and down. As I move the limit up, the
> problem becomes substantially worse; 300 swamps it and the selects take > 1
> hour to complete; at 600 they just all lock everything up and it stops
> processing. I did try 1,000 but it basically resulted in nothing being
> processed.
>
You've only described what happens when you turn the LIMIT up. What
happens when you turn it down? Why did you pick 250 in the first place? I
don't see the rationale for having 250*256 rows locked simultaneously. I
can see reasons you might want a LIMIT as high as 250, or for having 256
processes. I just don't see why you would want to do both in the same
system.

> Less processes does not give the throughput required because the queue
> sends data elsewhre which has a long round trip time but does permit over
> 1K concurrent connections as their work-round for throughput. I'm stuck
> having to scale up my concurrent processes in order to compensate for the
> long processing time of an individual queue item.
>

You've tied the database concurrency to the external process concurrency.
While this might be convenient, there is no reason to think it will be
optimal. If you achieve concurrency by having 256 processes, why does each
process need to lock 250 rows at time. Having 64,000 rows locked to
obtain 256-fold concurrency seems like a poor design.

With modern tools it should not be too hard to have just one process obtain
1000 rows, and launch 1000 concurrent external tasks. Either with threads
(making sure only one thread deals with the database), or with
asynchronous operations. (Then the problem would be how to harvest the
results, it couldn't unlock the rows until all external tasks have
finished, which would be a problem if some took much longer than others).

It is easy to reproduce scaling problems when you have a large number of
processes trying to do ORDER BY id LIMIT 250 FOR UPDATE SKIP LOCKED without
all the partitioning and stuff. I don't know if the problems are as severe
as you describe with your very elaborate setup--or even if they have the
same bottleneck. But in the simple case, there seems to be a lot of
spin-lock contention, as every selecting query needs to figure out if every
marked-as-locked row is truly locked, by asking if the apparently-locking
transaction is still valid.

Cheers,

Jeff

>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Julian Wolf 2020-08-26 06:54:39 Re: Too few rows expected by Planner on partitioned tables
Previous Message Pavel Stehule 2020-08-24 16:51:59 Re: sizing / capacity planning tipps related to expected request or transactions per second