From: | Andy Colson <andy(at)squeakycode(dot)net> |
---|---|
To: | Yngve Nysaeter Pettersen <yngve(at)opera(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Select for update with offset interferes with concurrent transactions |
Date: | 2011-02-01 19:04:31 |
Message-ID: | 4D48593F.6050107@squeakycode.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 2/1/2011 12:51 PM, Yngve Nysaeter Pettersen wrote:
>
> Thanks Andy,
>
> On Tue, 01 Feb 2011 19:29:08 +0100, Andy Colson <andy(at)squeakycode(dot)net>
> wrote:
>
>> On 2/1/2011 12:10 PM, Yngve Nysaeter Pettersen wrote:
>>> On Tue, 01 Feb 2011 18:11:23 +0100, Andy Colson <andy(at)squeakycode(dot)net>
>>>> I think the random limit thing is a race condition itself. Whenever
>>>> you have multiple processes hitting the same rows you're going to run
>>>> into problems. Have you thought of using a sequence instead of a
>>>> random limit? Each process could get the next 100 record_id'd via a
>>>> sequence, then there would be much less chance of deadlock.
>>>
>>> How would that work, in case you would like to provide an example?
>>>
>>> I am not really familiar with sequences, as I have only seen them used
>>> for the "id" field in Django generated tables.
>>>
>>> In case it is relevant, the processes does not (currently, at least)
>>> have a unique ID; though they have a local sequence number for the
>>> machine they are running on.
>>>
>>>
>>
>> I have a really simple q table I use.
>>
>> create table q (id integer not null, msg integer, primary key(id));
>> create sequence q_add;
>> create sequence q_read;
>>
>> I insert via q_add:
>>
>> andy=# insert into q(id, msg) values(nextval('q_add'), 20);
>> INSERT 0 1
>> andy=# insert into q(id, msg) values(nextval('q_add'), 4);
>> INSERT 0 1
>> andy=# select * from q;
>> id | msg
>> ----+-----
>> 1 | 20
>> 2 | 4
>> (2 rows)
>>
>>
>> Then I run multiple batch proc's which get their next job like:
>>
>> andy=# select msg from q where id = (select nextval('q_read'));
>> msg
>> -----
>> 20
>> (1 row)
>>
>> andy=# select msg from q where id = (select nextval('q_read'));
>> msg
>> -----
>> 4
>> (1 row)
>>
>>
>> It works for me because I can empty the q table, reset the q_add and
>> q_read sequences and start over clean. Not sure if it would work for
>> your setup.
>
> I see how that would work (it is essentially how Django assigns row ids).
>
> My current setup can have multiple runs configured at a time (and have had
> several dozen queued, in one case), with varying priorities on each run,
> and they might, at least theoretically, be configured in parallel (even
> the individual runs are set up in parallel), meaning the ids would not be
> sequential (a sequence is used for the id field in each row of the table),
> unless they could somehow be allocated for each individual run/project
> (multiple sequence objects, one for each run might be an option, but I
> don't like that possibility). And as I mentioned elsewhere in the thread I
> might make the queuing a bit more complex, which might make this system
> even more complicated.
>
> So, AFAICT I am afraid it would not work in the general case for my
> project :( .
> However, it might be useful in somebody else's project :) .
>
No, I didn't think it would work for you, yours looks much more
complicated than main. Just out of curiosity, have you looked at PgQ?
-Andy
From | Date | Subject | |
---|---|---|---|
Next Message | Radosław Smogura | 2011-02-01 19:05:22 | Re: Select for update with offset interferes with concurrent transactions |
Previous Message | Yngve Nysaeter Pettersen | 2011-02-01 18:51:19 | Re: Select for update with offset interferes with concurrent transactions |