From: | jonathan vanasco <postgres(at)2xlp(dot)com> |
---|---|
To: | pgsql-general general <pgsql-general(at)postgresql(dot)org> |
Subject: | improvements/feedback sought for a working query that looks a bit ugly and might be inefficient |
Date: | 2017-05-17 01:42:34 |
Message-ID: | 72883A7C-E8CF-409E-BAB9-6F60F0B3BD5D@2xlp.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Everything here works fine - but after a handful of product iterations & production adjustments, a query that handles a "task queue" across a few tables looks a bit ugly.
I'm wondering if anyone can see obvious improvements.
There are 3 tables:
upstream_provider
task
task_queue
Originally we needed to select 50 items off the top of the queue at a time.
Then we needed to set a max of 5 tasks per upstream provider (It's a cheap way to handle throttling).
The table is quite large, so a limit of the last 1000 items drastically improved performance.
The query got ugly when we needed to add a "priority" toggle to the queue -- basically to mark things as "process ASAP".
The only way I could figure out how to do that, was to add a sort -- on "is_priority DESC NULLS LAST".
My concern is that the sort needs to happen 3x --
in the subselect for 1000 items
in the partition for row numbering
in the final sort
If anyone has a moment to look this over and suggest anything, I'd be very thankful. A working reproduction is below.
SQL---
--------------
CREATE TABLE upstream_provider (id SERIAL PRIMARY KEY,
name VARCHAR(32),
is_paused BOOLEAN DEFAULT NULL
);
CREATE TABLE task (id SERIAL PRIMARY KEY,
upstream_provider_id INT NOT NULL REFERENCES upstream_provider(id),
name VARCHAR(32)
);
CREATE TABLE task_queue (id SERIAL PRIMARY KEY,
task_id INT NOT NULL REFERENCES task(id),
upstream_provider_id INT NOT NULL REFERENCES upstream_provider(id), # only here because it eliminates expensive joins elsewhere
processing_status BOOLEAN DEFAULT NULL,
is_priority BOOLEAN DEFAULT NULL
);
SELECT partition1.*
, task.*
FROM (SELECT window1.*
, row_number() OVER (PARTITION BY window1.upstream_provider_id
ORDER BY window1.is_priority DESC NULLS LAST,
window1.task_queue_id
) AS rownum
FROM (SELECT qu.id AS task_queue_id
, qu.upstream_provider_id
, qu.task_id
, qu.is_priority
FROM
task_queue qu
JOIN
upstream_provider ON qu.upstream_provider_id = upstream_provider.id
WHERE (qu.processing_status IS NULL)
AND (upstream_provider.is_paused IS NOT TRUE)
ORDER BY is_priority DESC NULLS LAST,
qu.id DESC
LIMIT 1000
) window1
) partition1
JOIN task ON partition1.task_id = task.id
WHERE partition1.rownum < 5
ORDER BY is_priority DESC NULLS LAST,
task_queue_id DESC
LIMIT
50
;
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2017-05-17 02:20:01 | Re: improvements/feedback sought for a working query that looks a bit ugly and might be inefficient |
Previous Message | Tom Lane | 2017-05-16 22:53:29 | Re: sorry, too many clients already error |