Re: Queuing query

From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Queuing query
Date: 2015-09-22 17:45:57
Message-ID: CAEfWYywnAUe85JrVfDujd9_v7Q5XNavfmWyYGGyvZeS4SoR=Rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks, Jeff. Like I said, this is the way-stripped-down version of the
core query with things like "the_priority" and "the_work" standing for more
complicated expressions. Lots of other stuff is going on to make sure we
get a response, clear the queue, etc. and we will index appropriately.

I'm really looking for any things like planner ordering nuances that would
make the query operate in unexpected ways.

Cheers,
Steve

On Tue, Sep 22, 2015 at 9:26 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Mon, Sep 21, 2015 at 3:51 PM, Steve Crawford <
> scrawford(at)pinpointresearch(dot)com> wrote:
>
>> While awaiting the awesomeness of the upcoming "skip locked" feature in
>> 9.5 I need to handle a work queue.
>>
>> Does anyone see any glaring issues or subtle nuances with the basic
>> method below which combines CTEs with queue-handling methods posted by
>> depesz, on the PG wiki and elsewhere.
>>
>> Note that it appears that there is the slight potential for a
>> race-condition which would cause one worker to occasionally fail to get a
>> record but the application code handles that issue fine.
>>
>> The work is sent to an externally hosted API which will ultimately reply
>> to a callback API at our end so obviously there's a lot of other stuff in
>> the system to update final results, recover from lost work, add to the
>> queue, etc. I'm just asking about the sanity of the queue processing query
>> itself:
>>
>> with next_up as (
>> select
>> the_id
>> from
>> queuetest
>> where
>> not sent_for_processing
>> and pg_try_advisory_xact_lock(12345, the_id)
>> order by
>> the_priority
>> limit 1 for update)
>> update
>> queuetest
>> set
>> sent_for_processing = true
>> where
>> the_id = (select the_id from next_up)
>> returning
>> the_work_to_do;
>>
>
> This will only be sane if the inner query can use an index to do the
> "order by". Otherwise it is going to read every row in order to sort them,
> and get the advisory lock on every row, and you will run out of shared
> memory. Of course, if it were sorting each time it would probably be too
> slow anyway.
>
> And it has to be a partial index:
>
> (the_priority) where not sent_for_processing
>
> Because if you just have an index on the_priority, the sub select will
> start getting inefficient once all the lowest numbered priority items are
> marked as sent.
>
> Also, you probably want to make sent_for_processing be some kind of token
> or time stamp, to make it easier to detect lost work. In which case NULL
> would mean not yet sent, so the partial index would be "where
> sent_for_processing is null".
>
> Cheers,
>
> Jeff
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Juan Pablo L. 2015-09-22 19:48:55 numeric data type
Previous Message Adrian Klaver 2015-09-22 16:51:16 Re: how to show time zone with numerical offset in CSV log?