Re: How do I track down a possible locking problem?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Herouth Maoz <herouth(at)unicell(dot)co(dot)il>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: How do I track down a possible locking problem?
Date: 2014-02-19 17:59:20
Message-ID: CAMkU=1z8CDuSt+fo0pH1S2Txuvn81Ps0km5kM4b7AAL8k15NUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Feb 19, 2014 at 2:40 AM, Herouth Maoz <herouth(at)unicell(dot)co(dot)il> wrote:

>
> On 18/02/2014, at 19:02, Jeff Janes wrote:
>
> On Mon, Feb 17, 2014 at 8:45 AM, Herouth Maoz <herouth(at)unicell(dot)co(dot)il>wrote:
>
>> I have a production system using Postgresql 9.1.2.
>>
>> The system basically receives messages, puts them in a queue, and then
>> several parallel modules, each in its own thread, read from that queue, and
>> perform two inserts, then release the message to the next queue for
>> non-database-related processing.
>>
>> Today, after we received complaints from a customer about delays, I
>> noticed odd behavior in the system. About 2 percent of the messages were
>> inserted into the tables more than an hour after they got into the system.
>>
>
> How do you know that?
>
>
> The message gets a time stamp (from Java) as it goes into the system. This
> time stamp is written into the first table, in a field named
> "time_arrived". The second table's insert (that follows immediately after
> the first) has a date_inserted field, which also gets a Java time stamp. So
> basically date_inserted - time_arrived is the interval that passed between
> the time the message came into the system and the time of the second
> insertion. That is - the time that it spent in the queue, plus the time
> that it spent in the first insert.
>

So the steps are these?:

message is received by your java system.
java calls gettimeofday and that time is saved locally and eventually put
into the first inserted tuple's time_arrived
java does some substantial undescribed stuff
java inserts first tuple and commits
java calls gettimeofday and uses it for second tuple's date_inserted, and
does no other meaningful work.
java inserts second tuple and commits

>> How can I truck such locks down? Does anybody have any ideas other than
>> starvation? The system lets me view statistics of how many messages were
>> processed in each modules and the average latency. None of the four modules
>> running has long average latency or low number of messages processes, so I
>> don't think the issue is related to any particular thread in my (Java)
>> system being slow or locked away by the others.
>>
>
> If the insert into PostgreSQL was freezing, wouldn't that time get
> reflected in your latency monitoring?
>
>
> Not sure what you mean. I think the insert is freezing, and indeed it gets
> reflected in the time monitored.
>

Wouldn't the average latency then be high, if the average was incorporating
even just a few latencies of 3600 seconds?

>
> It sounds to me like your application has a bug in its queue, where it
> forgets about items on the queue for a while.
>
>
> Not likely. This application has been running with the same queue
> implementation since 2001, even before we started using PostgreSQL... All
> bugs in that particular code would have been eliminated long ago. The
> system consists of dozens of queues, and we have monitoring that alerts us
> to any queue where messages are left unattended. If there was such a bug,
> we'd see queues that never become empty.
>
> But anyway, I'll put up some cron job that monitors the locks or the query
> lengths, and see if I can come up with anything. Theoretically, I have a
> way of logging when each message is enqueued or dequeued from each queue in
> the system, but this would slow down the production system, so I'll only do
> that as a last resort.
>

You can also turn "log_lock_waits=on" and something like
"log_min_duration_statement = 10000". If there is a PostgreSQL problem,
these setting should capture them in the PostgreSQL log file. If there is
no problem, they should not generate much log traffic for most systems.
(If you have some reporting processes that knowingly run slow queries, you
can prevent them from polluting the log by using a ROLE for which
log_min_duration_statement is disabled--it is little convoluted because
only superuser can change that setting.)

Cheers,

Jeff

In response to

Browse pgsql-general by date

  From Date Subject
Next Message bricklen 2014-02-19 18:08:19 Re: pg_dump/pg_restore issues
Previous Message Leonardo M. Ramé 2014-02-19 17:57:22 pg_dump/pg_restore issues