Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED

From: Michael Lewis <mlewis(at)entrata(dot)com>
To: Jim Jarvie <jim(at)talentstack(dot)to>
Cc: Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED
Date: 2020-08-20 17:30:52
Message-ID: CAHOFxGo1Aq1+c++1-p1oy2J13D2+oQJ2r=_oXH2F3y-wJ3Uc9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Great to hear that some of the issues are now mitigated. Though, perhaps
you actually require that ORDER BY if items are expected to be sitting in
the queue quite some time because you have incoming queue items in a burst
pattern and have to play catch up sometimes. If so, I highly suspect the
index on q_id is becoming very bloated and reindex concurrently would help.

> Partitions are list partitioned as 'incoming', 'processing', 'retry',
'ok', 'failed':

I am unclear on what purpose a "processing" status would have. Shouldn't a
row be in the incoming status & locked by select for update, until it
either gets updated to ok or failed (or left alone if retry is needed)?
What purpose do the retry and processing statuses serve? I don't understand
your full workflow to venture a guess on how you are hitting that error
regarding a row being in the wrong partition, but fewer main level
partitions and removing unneeded updates seems likely to help or resolve
the issue perhaps.

I don't know if you might have missed my last message, and the suggestion
from Laurenz to check pgstattuple.

At a high level, it seems like any needed update to the rows would result
in it being removed from the current partition and moved to another
partition. If you are doing this in a transaction block, then you could
just as well skip the select for update and just DELETE [] RETURNING from
the existing partition and insert into the new partition later (use a
select for update if you want to order the deletes*). If your transaction
fails and gets rolled back, then the delete won't have happened and the row
will get picked up by the next consumer.

Another thought is that I don't know how performant that hash partitioning
will be for select for update, particularly if that targets many partitions
potentially. Would it be feasible to match the number of partitions to the
number of consumers and actually have each of them working on one?

*
https://www.2ndquadrant.com/en/blog/what-is-select-skip-locked-for-in-postgresql-9-5/

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jim Jarvie 2020-08-20 21:13:27 Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED
Previous Message Jim Jarvie 2020-08-20 13:35:38 Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED