PostgreSQL as advanced job queuing system

From: ushi <ushi(at)mailbox(dot)org>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: PostgreSQL as advanced job queuing system
Date: 2024-03-22 11:43:40
Message-ID: 90ee725f-d003-4838-97ec-e7e4149ac75b@mailbox.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello List,

i am playing with the idea to implement a job queuing system using PostgreSQL. To meet requirements the system needs to offer some advanced
features compared to "classic" queuing systems:

- users can create new queues at any time
- queues have priorities
- priorities of queues can change at any time
- jobs in queues with the highest priority should be processed first

A simple schema could look like this:

create table queues (
id integer primary key,
priority integer not null default 0
);

create table jobs (
id serial primary key,
queue_id integer not null references queues(id)
);

Right now i am stuck writing an efficient query for dequeuing. This is what i got so far:

insert into queues (id, priority) values (1, 0), (2, 1), (3, 1);
insert into jobs (queue_id) select 1 from generate_series(1, 1000000);
insert into jobs (queue_id) select 2 from generate_series(1, 1000000);
insert into jobs (queue_id) select 3 from generate_series(1, 1000000);

Here is a simple dequeuing query that is super efficient, but ignores priority:

select * from jobs limit 1 for update of jobs skip locked;
-- id | queue_id
-- ----+----------
-- 1 | 1
-- (1 row)
--
-- Time: 2.641 ms

This is my naive query obeying priority. This is very slow and i could not get it any faster:

select *
from jobs
join queues on queues.id = jobs.queue_id
order by priority desc
limit 1
for update of jobs skip locked;
-- id | queue_id | id | priority
-- ---------+----------+----+----------
-- 1000001 | 2 | 2 | 1
-- (1 row)
--
-- Time: 1116.631 ms (00:01.117)

Here is the query plan:

-- QUERY PLAN
--
-----------------------------------------------------------------------------------------------------------------------------------------
-- Limit (cost=66225.61..66225.63 rows=1 width=28) (actual time=1333.139..1333.142 rows=1 loops=1)
-- -> LockRows (cost=66225.61..103725.61 rows=3000000 width=28) (actual time=1333.137..1333.139 rows=1 loops=1)
-- -> Sort (cost=66225.61..73725.61 rows=3000000 width=28) (actual time=1333.110..1333.112 rows=1 loops=1)
-- Sort Key: queues.priority DESC
-- Sort Method: external merge Disk: 111568kB
-- -> Hash Join (cost=60.85..51225.61 rows=3000000 width=28) (actual time=0.064..660.317 rows=3000000 loops=1)
-- Hash Cond: (jobs.queue_id = queues.id)
-- -> Seq Scan on jobs (cost=0.00..43275.00 rows=3000000 width=14) (actual time=0.027..253.868 rows=3000000 loops=1)
-- -> Hash (cost=32.60..32.60 rows=2260 width=14) (actual time=0.021..0.022 rows=3 loops=1)
-- Buckets: 4096 Batches: 1 Memory Usage: 33kB
-- -> Seq Scan on queues (cost=0.00..32.60 rows=2260 width=14) (actual time=0.011..0.013 rows=3 loops=1)
-- Planning Time: 0.430 ms
-- Execution Time: 1347.448 ms
-- (13 rows)

Any ideas for a more efficient implementation?

Thank you for your time,
ushi

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dominique Devienne 2024-03-22 12:04:46 Re: PostgreSQL as advanced job queuing system
Previous Message Vijaykumar Jain 2024-03-22 10:37:10 Re: uncommitted xmin 3100586 from before xid cutoff 10339367 needs to be frozen