Quick Links

Re: Design of a reliable task processing queue

From:	"Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>
To:	pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: Design of a reliable task processing queue
Date:	2025-01-19 11:37:01
Message-ID:	20250119113701.tnipxsn2gs4247zu@hjp.at
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 2025-01-18 19:44:07 +0900, Alex Burkhart wrote:
> REQUIREMENTS
>
> 1. Pending actions are persisted to a database. There's a trace once they are
> done.
> 2. Application workers pick actions one by one. At any given time, each action
> can be assigned to at most one worker (transaction).
> 3. If multiple actions have same "lock_id", only one of them is processed at
> the time. That has to be action with smallest id.
>
> MY ATTEMPT
>
> I got something approximate working with the following setup.
>
> === BEGIN DATABASE QUERY ===
>
> DROP TABLE IF EXISTS actions;
>
> CREATE TABLE actions (
> id SERIAL PRIMARY KEY,
> lock_id BIGINT,
> action VARCHAR(255),
> done BOOLEAN DEFAULT false
> );
>
[... removed Go code ...]
> This code generates transaction like this.
>
> === BEGIN TRANSACTION ===
>
> BEGIN;
>
> -- Lock one row in "actions" table.
>
> SELECT id, lock_id, action
> FROM actions
> WHERE done = false
> ORDER BY id
> LIMIT 1
> FOR NO KEY UPDATE
> SKIP LOCKED;
>
> -- Lock other transactions that process same lock_id.
>
> SELECT pg_advisory_xact_lock(%lock_id);
>
> -- Work on the action... mark it done at the end.
>
> UPDATE actions SET done = true WHERE id = %d;
>
> COMMIT;
>
> === END TRANSACTION ===
>
> Which almost does the job. By running the Go client concurrently, it picks
> actions one by one and processes only one transaction with same "lock_id" at a
> time.
>
> However I'm worried about the gap between the row lock and "lock_id" lock. This
> leaves a room for requirement "That has to be action with smallest id" to be
> unsatisfied.

I'd be worried about that, too.

Let's assume you have two entries in your queue:

123, 26, 'Create instance 26', false
124, 26, 'Update instance 26', false

And also two worker threads. Then this scenario is possible:

Worker A executes the SELECT ... SKIP LOCKED query and gets the
record with ID 123

Worker B executes the SELECT ... SKIP LOCKED query. Since record 123
is already locked it gets the record with ID 124

Worker B executes the SELECT pg_advisory_xact_lock(26) query
successfully and can continue

Worker A executes the SELECT pg_advisory_xact_lock(26) query, but
blocks and has to wait for worker B to commit (or rollback).

So in this case the update would happen before the create, which isn't
good.

> QUESTION
>
> Is there a way to improve this attempt and close the gap?

I'm not sure if I like this solution (if feels a bit hacky), but the
following should work, IMHO:

1) After the first select, do another on all records with the same
lock_id:
SELECT id, lock_id, action
FROM actions
WHERE lock_id = %d AND done = false
ORDER BY id
FOR NO KEY UPDATE
2) Check whether the id you got first is the smallest of all.
3) If it isn't, rollback and start over.
4) If it is, you have now locked all the rows with the same lock_id and
can continue.

The advisory lock isn't needed then.

In response to

Design of a reliable task processing queue at 2025-01-18 10:44:07 from Alex Burkhart

Browse pgsql-general by date

	From	Date	Subject
Next Message	nicolas75_fr	2025-01-19 12:51:53	RE: pg_repack and locks
Previous Message	Peter J. Holzer	2025-01-19 11:06:16	Re: Postgresql database terminates abruptly with too many open files error