Re: UPDATE grabs multiple rows when it seems like it should only grab one

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Burke <burke(at)shyp(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: UPDATE grabs multiple rows when it seems like it should only grab one
Date: 2016-04-22 23:27:03
Message-ID: CAKFQuwbmJcw0j=qjDjJaLA=97wB-J8fiz_e0-hObsN=s8J+nAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Apr 22, 2016 at 4:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Kevin Burke <burke(at)shyp(dot)com> writes:
> > I'm trying to write a job queue that grabs one job at a time from the
> > queue. I expect that the following query should update a maximum of one
> row
> > in the table:
>
> > UPDATE queued_jobs
> > SET status='in-progress',
> > updated_at=now()
> > FROM (
> > SELECT id AS inner_id
> > FROM queued_jobs
> > WHERE status='queued'
> > AND name = $1
> > AND run_after <= now()
> > LIMIT 1
> > FOR UPDATE
> > ) find_job
> > WHERE queued_jobs.id = find_job.inner_id
> > AND status='queued'
>
> I think you're assuming that the sub-query will always select the same
> row, but it doesn't have to.

​Actually, I assumed that the uncorrelated subquery would only be run a
single time...​

​The documentation on update, to me, seems to support this interpretation.

"""
When using FROM you should ensure that the join produces at most one output
row for each row to be modified. In other words, a target row shouldn't
join to more than one row from the other table(s)
"""​

​The understanding of JOIN that I hold is to take two complete relations
and combine them on some predicate. The from relation here, when complete,
only has one row and given it is effectively a self-join on the PK the
result of the join is guaranteed to be a single row. I do not follow how
the sub-select is allowed to be evaluated multiple times.​

LIMIT without an ORDER BY is ill-defined.
> Another problem is that once the outer UPDATE has changed the status
> of whichever row the sub-query selects initially, that row isn't
> a candidate to be returned by later subquery runs, so it'd certainly
> move on to another row. (I'm assuming here that FOR UPDATE allows
> the sub-query to see the effects of the outer update immediately,
> which might be wrong; I lack the time to go check right now.)
>
> You might have better luck by putting the sub-query in a CTE, where
> it will be executed at most once.
>

​Since I presume that is the desired semantics here​

​that seems like this is the best proper solution. Though now I'm curious
what people did before CTEs were available...this problem isn't new.

David J.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kevin Burke 2016-04-23 03:04:35 Re: UPDATE grabs multiple rows when it seems like it should only grab one
Previous Message Tom Lane 2016-04-22 23:09:47 Re: UPDATE grabs multiple rows when it seems like it should only grab one