Re: Possible multiprocess lock/unlock-loop problem in Postgresql 9.2

From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Possible multiprocess lock/unlock-loop problem in Postgresql 9.2
Date: 2014-01-04 18:40:31
Message-ID: 20140104184030.GA98212@crankycanuck.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, Jan 04, 2014 at 07:07:08PM +0100, Yngve N. Pettersen wrote:
> I tried that before, but ran into some issues, IIRC a similar looping
> problem as this where queries never ended. I split it up in an attempt to
> solve that problem.

Pulling the data out into the application and sending it back in won't
improve things. Exactly the same number of rows need to be visited,
but the way you have it now you have to marshall all the data and ship
it to the application too. So it's automatically slower. Indeed,
making it slower might have masked your problem.

> In the select/update case there is no sorting in the query; there is an
> offset/limit clause though, number of records retrieved are currently
> restricted to <10000 per query (out of 20 million in the active subset).
>
> SELECT id from queue where state = E'I' and job_id = <integer> offset
> <random 200..150000> limit <1-6000>

This could be part of problem. Are the different threads working on
different job_ids, or is that the same job_id? If you don't SORT that
query before the OFFSET, then the rows will come back in whatever
order the system likes. If your random function isn't very good, you
could well be selecting the same rows for updating. As you increase
the number of workers, the chances for collisions go up --
particularly if you're pulling 6000 things at a go. Remember that the
other updating workers are affecting the order in which rows are going
to come.

> However, in the UPDATE case, the looping processes are all UPDATE queries,
> no SELECTs involved.

But you said it's all in the same transaction scope. The lock is a
the transaction scope.

Anyway, what I'd do is try to cause the condition and post the
pg_locks information. When I've done this in the past, usually the
best thing to do is also to have query logs on for everything (don't
forget to log the pid!) so you can see what the other transaction
you're waiting on touched already. You can usually find the inversion
that way. Once you see it, it's always obvious what you've done, in
my experience (and completely mystifying before that, unfortunately).

A

--
Andrew Sullivan
ajs(at)crankycanuck(dot)ca

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Yngve N. Pettersen 2014-01-04 19:06:19 Re: Possible multiprocess lock/unlock-loop problem in Postgresql 9.2
Previous Message Moshe Jacobson 2014-01-04 18:15:15 Re: Suddenly all tables were gone