Is there a known bug with SKIP LOCKED and "tuple to be locked was already moved to another partition due to concurrent update"?

From: Gunther Schadow <gunther(at)pragmaticdata(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Is there a known bug with SKIP LOCKED and "tuple to be locked was already moved to another partition due to concurrent update"?
Date: 2020-06-30 16:10:17
Message-ID: 4f379a07-c49b-14dd-ddba-e0aaf37235d5@pragmaticdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

long time ago I devised with your help a task queuing system which uses
SELECT ... FOR UPDATE SKIP LOCKED for many parallel workers to find
tasks in the queue, and it used a partitioned table where the hot part
of the queue is short and so the query for a job is quick and the skip
locked locking makes sure that one job is only assigned to one worker.
And this works pretty well for me, except that when we run many workers
we find a lot of these failures occurring:

"tuple to be locked was already moved to another partition due to
concurrent update"

This would not exactly look like a bug, because the message says "to be
locked", so at least it's not allowing two workers to lock the same
tuple. But it seems that the skip-locked mode should not make an error
out of this, but treat it as the tuple was already locked. Why would it
want to lock the tuple (representing the job) if another worker has
already finished his UPDATE of the job to mark it as "done" (which is
what makes the tuple move to the "completed" partition.)

Either the SELECT for jobs to do returned a wrong tuple, which was
already update, or there is some lapse in the locking.

Either way it would seem to be a waste of time throwing all these errors
when the tuple should not even have been selected for update and locking.

I wonder if anybody knows anything about that issue? Of course you'll
want to see the DDL and SQL queries, etc. but you can't really try it
out unless you do some massively parallel magic. So I figured I just ask.

regards,
-Gunther

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2020-06-30 16:21:23 Re: Recommended value for pg_test_fsync
Previous Message Bruce Momjian 2020-06-30 15:24:08 Re: Recommended value for pg_test_fsync