Re: Infinite waitOnLock

From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: Leonard Meyer <lmeyer(at)excilys(dot)com>
Cc: List <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Infinite waitOnLock
Date: 2014-12-10 13:28:59
Message-ID: CADK3HH+v9i6-0BW0tj3AZafSqN8mpNA3BaiKL1SMVEpZBbnUOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

So I'm curious why doesn't your app send a SIGTERM which would interrupt
the threads.
Other than that I can think of two strategies

1) make isValid smarter
2) put a timeout in the wait, but of course we don't know how long to wait?
However I am thinking that very large copies can't be that common.

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca

On 10 December 2014 at 08:21, Leonard Meyer <lmeyer(at)excilys(dot)com> wrote:

> Same behavior with jar from master. All threads are logging
> "org.postgresql.util.PSQLException: Database connection failed when
> canceling copy operation" and go into waiting state.
>
> 2014-12-10 12:34 GMT+01:00 Dave Cramer <pg(at)fastcrypt(dot)com>:
>
>> So sure enough we do not timeout. The question is how long to wait. I
>> haven't looked at this in detail, but copy could take a very long time
>> depending on what people are copying in/out
>>
>> Before we go chasing this can you build a driver from master and try it?
>>
>> Dave Cramer
>>
>> dave.cramer(at)credativ(dot)ca
>> http://www.credativ.ca
>>
>> On 10 December 2014 at 03:47, Leonard Meyer <lmeyer(at)excilys(dot)com> wrote:
>>
>>> Hi,
>>>
>>> I was advised to come here for this instead of the pgsql-bugs ML. As I
>>> already explained it, I'll just copy paste the problem :
>>>
>>> We're doing an ETL Akka application using PostgreSQL 9.3.5 using
>>> driver 9.3-1102-jdbc41. We're currently testing our application resilience
>>> by shutting down the database in the middle of our processing. Basically we
>>> have 8 threads working and when shutting down, some threads just get stuck
>>> forever waiting on a lock.
>>> We think it's coming from a bug inside the driver because we would
>>> expect the lock acquisition to timeout at some point. Here's the stacktrace
>>> of a frozen thread :
>>>
>>> "xxxxxx-akka.actor.pinned-dispatcher-26" prio=10 tid=0x00007f3a1c007800
>>>> nid=0x3125 in Object.wait() [0x00007f3acaae8000]
>>>> java.lang.Thread.State: WAITING (on object monitor)
>>>> at java.lang.Object.wait(Native Method)
>>>> - waiting on <0x0000000744a14aa0> (a org.postgresql.core.v3.
>>>> QueryExecutorImpl)
>>>> at java.lang.Object.wait(Object.java:503)
>>>> at org.postgresql.core.v3.QueryExecutorImpl.waitOnLock(
>>>> QueryExecutorImpl.java:91)
>>>> at org.postgresql.core.v3.QueryExecutorImpl.execute(
>>>> QueryExecutorImpl.java:228)
>>>> - locked <0x0000000744a14aa0> (a org.postgresql.core.v3.
>>>> QueryExecutorImpl)
>>>> at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(
>>>> AbstractJdbc2Statement.java:561)
>>>> at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(
>>>> AbstractJdbc2Statement.java:405)
>>>> at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(
>>>> AbstractJdbc2Statement.java:285)
>>>> at org.postgresql.jdbc4.AbstractJdbc4Connection.isValid(
>>>> AbstractJdbc4Connection.java:130)
>>>> at org.postgresql.jdbc4.Jdbc4Connection.isValid(
>>>> Jdbc4Connection.java:21)
>>>> at com.zaxxer.hikari.proxy.ConnectionProxy.isValid(
>>>> ConnectionProxy.java:357)
>>>> at xx.xxxx.actors.flux.mclu.McluProcessor$$anonfun$
>>>> receive$1.applyOrElse(McluProcessor.scala:65)
>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>> at xx.xxxx.actors.flux.mclu.McluProcessor.aroundReceive(
>>>> McluProcessor.scala:30)
>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.java:1145)
>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:744)
>>>> Locked ownable synchronizers:
>>>> - <0x000000074482e6d0> (a java.util.concurrent.
>>>> ThreadPoolExecutor$Worker)
>>>
>>>
>>> We've first dropped by the Akka group, thinking it was a problem with
>>> our code, but nothing came up. HikariCP's main developper showed up with an
>>> explanation though :
>>>
>>> It appears that you are using the PostgreSQL CopyManager, correct?
>>>> Looking at QueryExecutorImpl it appears that rollback() is trying to
>>>> obtain a lock that was not released by the CopyManager. I recommend using
>>>> the CopyManager.copyIn() method that returns a CopyIn object, rather
>>>> than using the convenience method that takes a reader. Use the
>>>> writeToCopy() to pump the data in, and be sure to catch SQLException.
>>>> If you get an SQLException, call cancelCopy() and retry or whatever
>>>> your recovery scenario is, otherwise call endCopy(). I would have
>>>> expected PostgreSQL to handle the severing of a Connection in the middle of
>>>> a bulk copy better, but that is probably a question for the PostgreSQL
>>>> group.
>>>
>>>
>>> Tried it, no luck. Here the link for reference :
>>> https://groups.google.com/forum/#!topic/akka-user/Ehzioy3jVoU
>>>
>>> Here's what we can show you of our (scala) code :
>>>
>>>
>>>> val cpManager = connection.unwrap(classOf[PGConnection]).getCopyAPI
>>>> val stringBytes: Array[Byte] = batchStrings.map(_.toByte).toArray
>>>> val copy = cpManager.copyIn(s"COPY temp FROM STDIN WITH CSV")
>>>> try {
>>>> copy.writeToCopy(stringBytes, 0, stringBytes.size)
>>>> copy.endCopy()
>>>> } finally {
>>>> if (copy.isActive) {
>>>> copy.cancelCopy()
>>>> }
>>>> }
>>>
>>>
>>> Connection used is closed further down in a finally block.
>>>
>>> We're rather sure the lock is coming from here since we have another
>>> exactly similar processing but with regular inserts instead of this, and no
>>> issues so far. Thanks for any help.
>>>
>>
>>
>

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Leonard Meyer 2014-12-10 14:09:46 Re: Infinite waitOnLock
Previous Message Leonard Meyer 2014-12-10 13:21:57 Re: Infinite waitOnLock