Re: Serialization, Locking...implement processing Queue with a table

From: "D(dot) Dante Lorenso" <dante(at)lorenso(dot)com>
To: "Manfred Koizar" <mkoi-pg(at)aon(dot)at>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Serialization, Locking...implement processing Queue with a table
Date: 2003-05-14 08:40:34
Message-ID: 08b801c319f4$7ff58fa0$1564a8c0@ROMULUS
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Ah. Very good, Manfred. I see what you are saying now.
This will indeed work. And it works without locking.
I like it. Hmmm... So, to convert your SQL back into
pseudo-code of my own ... You are saying it works something
like this:

---------- 8< -------------------- 8< ----------
FUNCTION reserve_job
BEGIN
SELECT and LOCK row_id matching our criteria. (SELECT FOR UPDATE)

IF (row_id was found) THEN
RESERVE row_id (UPDATE)
RETURN (row_id) -- done
ELSE
Ask "Are you Sure?" there are no rows matching our criteria?
IF (certainly no row_id exists) THEN
RETURN (0) -- no row ID exists
ELSE
RETURN reserve_job -- recursive call
END IF
END IF
END

---------- 8< -------------------- 8< ----------

OK, I see this SHOULD INDEED work. So, now my question is
about costs. Assuming both approaches (test/retry VS lock/wait)
yield a functional result. Which is best? Which is most
efficient. Here are some pros and cons:

LOCK/WAIT
--------------------------------
PROS
- fewer queries made: only one lock and one select per call
- easy to implement
- access to function is granted in order requested (per
PostgreSQL handling of locks - expectation is that no
starvation should occur)
CONS
- requires setting locks in application code
which could be forgotten by app developers
- locks are placed outside function blocking entire function
which may have code that might be safely run concurrently

TEST/RETRY
--------------------------------
PROS
- in theory, code SHOULD be entirely contained within a
single stored procedure (less application coding needed)
- no locks needed (faster execution?)
CONS
- requires extra queries to determine empty queue
- may starve if a single process continues to grab the same
row as other processes
- need recursive calls in PL/PGSQL?

I am going to try to implement your suggestion and see
what I get in comparison to what I am seeing now with my
LOCK/WAIT code.

Dante

D. Dante Lorenso
dante(at)direct2prospect(dot)com
972-333-4139

> On Wed, 14 May 2003 01:10:39 -0500, "D. Dante Lorenso"
> <dante(at)lorenso(dot)com> wrote:
> >> How do you distinguish between (i) and (ii)? Just do
> >> SELECT job_id FROM job_table
> >> WHERE processor_id = 0 LIMIT 1;
> >> If this returns 0 rows, you have (i).
> >> If it returns one row, you have (ii).
> >
> >You can't do it this way
>
> Oh.
>
> > because if this select IS successful,
> >you'll still need to LOCK the row for the update.
>
> That's exactly the reason why I told you ...
>
> |(ii) The row has been reserved by another transaction running at the
> |same time. In this case, restart at SELECT FOR UPDATE.
> ^^
> not after!
>
> If there is a row satisfying the WHERE clause but SELECT FOR UPDATE
> does not return it, this is the effect of a race condition:
>
> SELECT xmax,* FROM job;
> xmax | id | pr
> ------+----+----
> 0 | 2 | 0
> 0 | 3 | 1
> 0 | 1 | 2
> 0 | 4 | 0
>
> Session 1 Session 2
>
> BEGIN;
> SELECT xmax,* FROM job
> WHERE pr = 0 FOR UPDATE LIMIT 1;
> xmax | id | pr
> ------+----+----
> 0 | 2 | 0
> (1 row)
>
> select xmax,* FROM job
> WHERE id = 2;
> xmax | id | pr
> -------+----+----
> 58634 | 2 | 0
> (1 row)
> BEGIN;
> select xmax,* FROM job
> WHERE pr = 0 LIMIT 1;
> xmax | id | pr
> -------+----+----
> 58634 | 2 | 0
> (1 row)
>
> SELECT xmax,* FROM job
> WHERE pr = 0 FOR UPDATE LIMIT 1;
> -- waits, because the row with id=2
> -- satisfies the WHERE clause but
> -- is locked by transaction 58634 ...
> UPDATE job SET pr = 1
> WHERE id = 2;
> -- The same would happen, if we did the
> -- SELECT FOR UPDATE here (after the
> -- UPDATE in the other session), because
> -- our *visible* version of the row
> -- still satisfies the WHERE clause.
>
> select xmax,* FROM job
> WHERE id = 2;
> xmax | id | pr
> ------+----+----
> 0 | 2 | 1
> (1 row)
> -- xmax = 0 because we see
> -- the newly inserted tuple
>
> COMMIT;
> -- continues ...
> xmax | id | pr
> ------+----+----
> (0 rows)
> -- because the row this SELECT was about
> -- to return does not satisfy pr = 0
> -- any more
>
> SELECT xmax,* FROM job
> WHERE pr = 0 LIMIT 1;
> xmax | id | pr
> ------+----+----
> 0 | 4 | 0
> (1 row)
> -- but there is another row, so ...
>
> SELECT xmax,* FROM job
> WHERE pr = 0 FOR UPDATE LIMIT 1;
> xmax | id | pr
> ------+----+----
> 0 | 4 | 0
> (1 row)
> -- does not necessarily return the same
> -- id as the previous SELECT
>
> UPDATE ...;
> COMMIT;
>
> HTH.
> Servus
> Manfred
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message James Gregory 2003-05-14 08:58:24 server process segfaulting
Previous Message Manfred Koizar 2003-05-14 07:41:26 Re: Serialization, Locking...implement processing Queue with a table