Re: maintaining a reference to a fetched row

From: Brian Karlak <zenkat(at)metaweb(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: maintaining a reference to a fetched row
Date: 2009-11-04 17:41:54
Message-ID: 82545852-CE9E-4B0D-ABAB-B628C888D086@metaweb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On Nov 4, 2009, at 8:47 AM, Jeff Janes wrote:

>> Worker daemons do a bounded, ordered, limited SELECT to grab a row,
>> which
>> they lock by setting a value in the queue.status column.
>
> So you do a select, and then an update?

I do a select for update in a stored proc:

FOR queue_item IN
SELECT * FROM queue
WHERE status IS NULL AND id >= low_bound_id
ORDER BY id LIMIT batch_size
FOR UPDATE
LOOP
UPDATE queue_proc set status = 'proc' where id = queue_item.id ;

The daemons keep track of their last position in the queue with
low_bound_id. Also, as you probably notice, I also fetch a batch of
(100) items at a time. In practice, it's pretty fast. The job I'm
running now is showing an average fetch time of 30ms per 100 actions,
which ain't bad.

>> However, the writing of results back to the row takes ~5ms, which
>> is slower
>> than I'd like.
>
> It seems you have an select, and update, and another update. Where in
> this process do you commit? Are you using fsync=off or
> synchronous_commit=off?

First commit occurs after the stored proc to select/update a batch of
items is complete. Second commit occurs on the writing of results
back for each particular action. Two commits are required because the
time it takes to complete the intervening action can vary wildly:
anywhere between 20ms and 45min.

>> It seems that this is because I need to to do an index scan
>> on the queue table to find the row I just fetched.
>
> Why would the index scan take 1 ms two of the times it is done but 5ms
> the third time? Isn't it the same index scan each time? Or does the
> change in queue.status change the plan?

The final update is a different query -- just a plain old update by ID:

UPDATE queue_proc set status = 'proc' where id = %s ;

This update by ID takes ~2.5ms, which means it's where the framework
is spending most of its overhead.

Brian

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2009-11-04 20:43:47 Re: maintaining a reference to a fetched row
Previous Message Brian Karlak 2009-11-04 17:25:27 Re: maintaining a reference to a fetched row