Re: Infinite waitOnLock

From: Leonard Meyer <lmeyer(at)excilys(dot)com>
To: Dave Cramer <pg(at)fastcrypt(dot)com>
Cc: List <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Infinite waitOnLock
Date: 2014-12-10 13:21:57
Message-ID: CADGbXSSo2A0fc35pyOHj8EWXX-UaH=WcEKHb4dhOek=cU6e6aA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Same behavior with jar from master. All threads are logging
"org.postgresql.util.PSQLException: Database connection failed when
canceling copy operation" and go into waiting state.

2014-12-10 12:34 GMT+01:00 Dave Cramer <pg(at)fastcrypt(dot)com>:

> So sure enough we do not timeout. The question is how long to wait. I
> haven't looked at this in detail, but copy could take a very long time
> depending on what people are copying in/out
>
> Before we go chasing this can you build a driver from master and try it?
>
> Dave Cramer
>
> dave.cramer(at)credativ(dot)ca
> http://www.credativ.ca
>
> On 10 December 2014 at 03:47, Leonard Meyer <lmeyer(at)excilys(dot)com> wrote:
>
>> Hi,
>>
>> I was advised to come here for this instead of the pgsql-bugs ML. As I
>> already explained it, I'll just copy paste the problem :
>>
>> We're doing an ETL Akka application using PostgreSQL 9.3.5 using
>> driver 9.3-1102-jdbc41. We're currently testing our application resilience
>> by shutting down the database in the middle of our processing. Basically we
>> have 8 threads working and when shutting down, some threads just get stuck
>> forever waiting on a lock.
>> We think it's coming from a bug inside the driver because we would expect
>> the lock acquisition to timeout at some point. Here's the stacktrace of a
>> frozen thread :
>>
>> "xxxxxx-akka.actor.pinned-dispatcher-26" prio=10 tid=0x00007f3a1c007800
>>> nid=0x3125 in Object.wait() [0x00007f3acaae8000]
>>> java.lang.Thread.State: WAITING (on object monitor)
>>> at java.lang.Object.wait(Native Method)
>>> - waiting on <0x0000000744a14aa0> (a org.postgresql.core.v3.
>>> QueryExecutorImpl)
>>> at java.lang.Object.wait(Object.java:503)
>>> at org.postgresql.core.v3.QueryExecutorImpl.waitOnLock(
>>> QueryExecutorImpl.java:91)
>>> at org.postgresql.core.v3.QueryExecutorImpl.execute(
>>> QueryExecutorImpl.java:228)
>>> - locked <0x0000000744a14aa0> (a org.postgresql.core.v3.
>>> QueryExecutorImpl)
>>> at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(
>>> AbstractJdbc2Statement.java:561)
>>> at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(
>>> AbstractJdbc2Statement.java:405)
>>> at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(
>>> AbstractJdbc2Statement.java:285)
>>> at org.postgresql.jdbc4.AbstractJdbc4Connection.isValid(
>>> AbstractJdbc4Connection.java:130)
>>> at org.postgresql.jdbc4.Jdbc4Connection.isValid(Jdbc4Connection.java:21)
>>> at com.zaxxer.hikari.proxy.ConnectionProxy.isValid(
>>> ConnectionProxy.java:357)
>>> at xx.xxxx.actors.flux.mclu.McluProcessor$$anonfun$
>>> receive$1.applyOrElse(McluProcessor.scala:65)
>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>> at xx.xxxx.actors.flux.mclu.McluProcessor.aroundReceive(
>>> McluProcessor.scala:30)
>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(
>>> ThreadPoolExecutor.java:1145)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>> ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:744)
>>> Locked ownable synchronizers:
>>> - <0x000000074482e6d0> (a java.util.concurrent.
>>> ThreadPoolExecutor$Worker)
>>
>>
>> We've first dropped by the Akka group, thinking it was a problem with
>> our code, but nothing came up. HikariCP's main developper showed up with an
>> explanation though :
>>
>> It appears that you are using the PostgreSQL CopyManager, correct?
>>> Looking at QueryExecutorImpl it appears that rollback() is trying to
>>> obtain a lock that was not released by the CopyManager. I recommend using
>>> the CopyManager.copyIn() method that returns a CopyIn object, rather
>>> than using the convenience method that takes a reader. Use the
>>> writeToCopy() to pump the data in, and be sure to catch SQLException.
>>> If you get an SQLException, call cancelCopy() and retry or whatever
>>> your recovery scenario is, otherwise call endCopy(). I would have
>>> expected PostgreSQL to handle the severing of a Connection in the middle of
>>> a bulk copy better, but that is probably a question for the PostgreSQL
>>> group.
>>
>>
>> Tried it, no luck. Here the link for reference :
>> https://groups.google.com/forum/#!topic/akka-user/Ehzioy3jVoU
>>
>> Here's what we can show you of our (scala) code :
>>
>>
>>> val cpManager = connection.unwrap(classOf[PGConnection]).getCopyAPI
>>> val stringBytes: Array[Byte] = batchStrings.map(_.toByte).toArray
>>> val copy = cpManager.copyIn(s"COPY temp FROM STDIN WITH CSV")
>>> try {
>>> copy.writeToCopy(stringBytes, 0, stringBytes.size)
>>> copy.endCopy()
>>> } finally {
>>> if (copy.isActive) {
>>> copy.cancelCopy()
>>> }
>>> }
>>
>>
>> Connection used is closed further down in a finally block.
>>
>> We're rather sure the lock is coming from here since we have another
>> exactly similar processing but with regular inserts instead of this, and no
>> issues so far. Thanks for any help.
>>
>
>

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Dave Cramer 2014-12-10 13:28:59 Re: Infinite waitOnLock
Previous Message Dave Cramer 2014-12-10 11:44:03 Re: Incorrect return code for error on call of commit during 2PC