Re: Infinite waitOnLock

From: Leonard Meyer <lmeyer(at)excilys(dot)com>
To: Dave Cramer <pg(at)fastcrypt(dot)com>
Cc: List <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Infinite waitOnLock
Date: 2014-12-10 14:09:46
Message-ID: CADGbXSQsmONuFjDjLKSJ1ySkkMXp1gZHQiJYrRT70yc=LKPZKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Actually I already tried forking and putting a one minute timeout in the
wait. But absolutely nothing changed. But maybe I forgot something I don't
know how all this works.

2014-12-10 14:28 GMT+01:00 Dave Cramer <pg(at)fastcrypt(dot)com>:

> So I'm curious why doesn't your app send a SIGTERM which would interrupt
> the threads.
> Other than that I can think of two strategies
>
> 1) make isValid smarter
> 2) put a timeout in the wait, but of course we don't know how long to
> wait? However I am thinking that very large copies can't be that common.
>
> Dave Cramer
>
> dave.cramer(at)credativ(dot)ca
> http://www.credativ.ca
>
> On 10 December 2014 at 08:21, Leonard Meyer <lmeyer(at)excilys(dot)com> wrote:
>
>> Same behavior with jar from master. All threads are logging
>> "org.postgresql.util.PSQLException: Database connection failed when
>> canceling copy operation" and go into waiting state.
>>
>> 2014-12-10 12:34 GMT+01:00 Dave Cramer <pg(at)fastcrypt(dot)com>:
>>
>>> So sure enough we do not timeout. The question is how long to wait. I
>>> haven't looked at this in detail, but copy could take a very long time
>>> depending on what people are copying in/out
>>>
>>> Before we go chasing this can you build a driver from master and try it?
>>>
>>> Dave Cramer
>>>
>>> dave.cramer(at)credativ(dot)ca
>>> http://www.credativ.ca
>>>
>>> On 10 December 2014 at 03:47, Leonard Meyer <lmeyer(at)excilys(dot)com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I was advised to come here for this instead of the pgsql-bugs ML. As I
>>>> already explained it, I'll just copy paste the problem :
>>>>
>>>> We're doing an ETL Akka application using PostgreSQL 9.3.5 using
>>>> driver 9.3-1102-jdbc41. We're currently testing our application resilience
>>>> by shutting down the database in the middle of our processing. Basically we
>>>> have 8 threads working and when shutting down, some threads just get stuck
>>>> forever waiting on a lock.
>>>> We think it's coming from a bug inside the driver because we would
>>>> expect the lock acquisition to timeout at some point. Here's the stacktrace
>>>> of a frozen thread :
>>>>
>>>> "xxxxxx-akka.actor.pinned-dispatcher-26" prio=10
>>>>> tid=0x00007f3a1c007800 nid=0x3125 in Object.wait() [0x00007f3acaae8000]
>>>>> java.lang.Thread.State: WAITING (on object monitor)
>>>>> at java.lang.Object.wait(Native Method)
>>>>> - waiting on <0x0000000744a14aa0> (a org.postgresql.core.v3.
>>>>> QueryExecutorImpl)
>>>>> at java.lang.Object.wait(Object.java:503)
>>>>> at org.postgresql.core.v3.QueryExecutorImpl.waitOnLock(
>>>>> QueryExecutorImpl.java:91)
>>>>> at org.postgresql.core.v3.QueryExecutorImpl.execute(
>>>>> QueryExecutorImpl.java:228)
>>>>> - locked <0x0000000744a14aa0> (a org.postgresql.core.v3.
>>>>> QueryExecutorImpl)
>>>>> at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(
>>>>> AbstractJdbc2Statement.java:561)
>>>>> at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(
>>>>> AbstractJdbc2Statement.java:405)
>>>>> at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(
>>>>> AbstractJdbc2Statement.java:285)
>>>>> at org.postgresql.jdbc4.AbstractJdbc4Connection.isValid(
>>>>> AbstractJdbc4Connection.java:130)
>>>>> at org.postgresql.jdbc4.Jdbc4Connection.isValid(
>>>>> Jdbc4Connection.java:21)
>>>>> at com.zaxxer.hikari.proxy.ConnectionProxy.isValid(
>>>>> ConnectionProxy.java:357)
>>>>> at xx.xxxx.actors.flux.mclu.McluProcessor$$anonfun$
>>>>> receive$1.applyOrElse(McluProcessor.scala:65)
>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>> at xx.xxxx.actors.flux.mclu.McluProcessor.aroundReceive(
>>>>> McluProcessor.scala:30)
>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>> ThreadPoolExecutor.java:1145)
>>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>> ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:744)
>>>>> Locked ownable synchronizers:
>>>>> - <0x000000074482e6d0> (a java.util.concurrent.
>>>>> ThreadPoolExecutor$Worker)
>>>>
>>>>
>>>> We've first dropped by the Akka group, thinking it was a problem with
>>>> our code, but nothing came up. HikariCP's main developper showed up with an
>>>> explanation though :
>>>>
>>>> It appears that you are using the PostgreSQL CopyManager, correct?
>>>>> Looking at QueryExecutorImpl it appears that rollback() is trying to
>>>>> obtain a lock that was not released by the CopyManager. I recommend using
>>>>> the CopyManager.copyIn() method that returns a CopyIn object, rather
>>>>> than using the convenience method that takes a reader. Use the
>>>>> writeToCopy() to pump the data in, and be sure to catch
>>>>> SQLException. If you get an SQLException, call cancelCopy() and
>>>>> retry or whatever your recovery scenario is, otherwise call endCopy().
>>>>> I would have expected PostgreSQL to handle the severing of a Connection in
>>>>> the middle of a bulk copy better, but that is probably a question for the
>>>>> PostgreSQL group.
>>>>
>>>>
>>>> Tried it, no luck. Here the link for reference :
>>>> https://groups.google.com/forum/#!topic/akka-user/Ehzioy3jVoU
>>>>
>>>> Here's what we can show you of our (scala) code :
>>>>
>>>>
>>>>> val cpManager = connection.unwrap(classOf[PGConnection]).getCopyAPI
>>>>> val stringBytes: Array[Byte] = batchStrings.map(_.toByte).toArray
>>>>> val copy = cpManager.copyIn(s"COPY temp FROM STDIN WITH CSV")
>>>>> try {
>>>>> copy.writeToCopy(stringBytes, 0, stringBytes.size)
>>>>> copy.endCopy()
>>>>> } finally {
>>>>> if (copy.isActive) {
>>>>> copy.cancelCopy()
>>>>> }
>>>>> }
>>>>
>>>>
>>>> Connection used is closed further down in a finally block.
>>>>
>>>> We're rather sure the lock is coming from here since we have another
>>>> exactly similar processing but with regular inserts instead of this, and no
>>>> issues so far. Thanks for any help.
>>>>
>>>
>>>
>>
>

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Dave Cramer 2014-12-10 14:21:26 Re: Infinite waitOnLock
Previous Message Dave Cramer 2014-12-10 13:28:59 Re: Infinite waitOnLock