Re: Possible multiprocess lock/unlock-loop problem in Postgresql 9.2

From: "Yngve N(dot) Pettersen" <yngve(at)spec-work(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Possible multiprocess lock/unlock-loop problem in Postgresql 9.2
Date: 2014-01-04 19:06:19
Message-ID: op.w86fctav3dfyax@damia
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, 04 Jan 2014 19:40:31 +0100, Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
wrote:

> On Sat, Jan 04, 2014 at 07:07:08PM +0100, Yngve N. Pettersen wrote:
>> I tried that before, but ran into some issues, IIRC a similar looping
>> problem as this where queries never ended. I split it up in an attempt
>> to
>> solve that problem.
>
> Pulling the data out into the application and sending it back in won't
> improve things. Exactly the same number of rows need to be visited,
> but the way you have it now you have to marshall all the data and ship
> it to the application too. So it's automatically slower. Indeed,
> making it slower might have masked your problem.

Could be

>
>> In the select/update case there is no sorting in the query; there is an
>> offset/limit clause though, number of records retrieved are currently
>> restricted to <10000 per query (out of 20 million in the active subset).
>>
>> SELECT id from queue where state = E'I' and job_id = <integer>
>> offset
>> <random 200..150000> limit <1-6000>
>
> This could be part of problem. Are the different threads working on
> different job_ids, or is that the same job_id? If you don't SORT that

Same job_id, at least in the current system.

> query before the OFFSET, then the rows will come back in whatever
> order the system likes.

I suspect that a sort operation on (currently) 20+ million rows for every
query for just 6000 (previous version was 1500 entries) would cause quite
a bit more slowness than breaking up the query in two operations, or the
risk of collisions would, because each process would have to load all that
information (even if it is cached).

>> However, in the UPDATE case, the looping processes are all UPDATE
>> queries,
>> no SELECTs involved.
>
> But you said it's all in the same transaction scope. The lock is a
> the transaction scope.

But the statement it locks/loops on are only UPDATE statements, also in
the processes that are waiting.

> Anyway, what I'd do is try to cause the condition and post the
> pg_locks information. When I've done this in the past, usually the
> best thing to do is also to have query logs on for everything (don't
> forget to log the pid!) so you can see what the other transaction
> you're waiting on touched already. You can usually find the inversion
> that way. Once you see it, it's always obvious what you've done, in
> my experience (and completely mystifying before that, unfortunately).

Will take a look in a few days, probably midweek.

--
Using Opera's mail client: http://www.opera.com/mail/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2014-01-04 19:08:22 Re: Suddenly all tables were gone
Previous Message Andrew Sullivan 2014-01-04 18:40:31 Re: Possible multiprocess lock/unlock-loop problem in Postgresql 9.2