Re: Infinite waitOnLock

From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: Leonard Meyer <lmeyer(at)excilys(dot)com>
Cc: List <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Infinite waitOnLock
Date: 2014-12-10 14:21:26
Message-ID: CADK3HH+c2f1KLKX7ZhYnwUOS0z9dFVnT8ojezuObrGRw_+d6Bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

So looking at the code, it is in an endless loop, the only way out is
either get the lock or get interrupted. I does not look like the latter
occurs which is a problem with akka.

When it shuts down it should interrupt all threads

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca

On 10 December 2014 at 09:09, Leonard Meyer <lmeyer(at)excilys(dot)com> wrote:

> Actually I already tried forking and putting a one minute timeout in the
> wait. But absolutely nothing changed. But maybe I forgot something I don't
> know how all this works.
>
> 2014-12-10 14:28 GMT+01:00 Dave Cramer <pg(at)fastcrypt(dot)com>:
>
>> So I'm curious why doesn't your app send a SIGTERM which would interrupt
>> the threads.
>> Other than that I can think of two strategies
>>
>> 1) make isValid smarter
>> 2) put a timeout in the wait, but of course we don't know how long to
>> wait? However I am thinking that very large copies can't be that common.
>>
>> Dave Cramer
>>
>> dave.cramer(at)credativ(dot)ca
>> http://www.credativ.ca
>>
>> On 10 December 2014 at 08:21, Leonard Meyer <lmeyer(at)excilys(dot)com> wrote:
>>
>>> Same behavior with jar from master. All threads are logging
>>> "org.postgresql.util.PSQLException: Database connection failed when
>>> canceling copy operation" and go into waiting state.
>>>
>>> 2014-12-10 12:34 GMT+01:00 Dave Cramer <pg(at)fastcrypt(dot)com>:
>>>
>>>> So sure enough we do not timeout. The question is how long to wait. I
>>>> haven't looked at this in detail, but copy could take a very long time
>>>> depending on what people are copying in/out
>>>>
>>>> Before we go chasing this can you build a driver from master and try
>>>> it?
>>>>
>>>> Dave Cramer
>>>>
>>>> dave.cramer(at)credativ(dot)ca
>>>> http://www.credativ.ca
>>>>
>>>> On 10 December 2014 at 03:47, Leonard Meyer <lmeyer(at)excilys(dot)com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I was advised to come here for this instead of the pgsql-bugs ML. As I
>>>>> already explained it, I'll just copy paste the problem :
>>>>>
>>>>> We're doing an ETL Akka application using PostgreSQL 9.3.5 using
>>>>> driver 9.3-1102-jdbc41. We're currently testing our application resilience
>>>>> by shutting down the database in the middle of our processing. Basically we
>>>>> have 8 threads working and when shutting down, some threads just get stuck
>>>>> forever waiting on a lock.
>>>>> We think it's coming from a bug inside the driver because we would
>>>>> expect the lock acquisition to timeout at some point. Here's the stacktrace
>>>>> of a frozen thread :
>>>>>
>>>>> "xxxxxx-akka.actor.pinned-dispatcher-26" prio=10
>>>>>> tid=0x00007f3a1c007800 nid=0x3125 in Object.wait() [0x00007f3acaae8000]
>>>>>> java.lang.Thread.State: WAITING (on object monitor)
>>>>>> at java.lang.Object.wait(Native Method)
>>>>>> - waiting on <0x0000000744a14aa0> (a org.postgresql.core.v3.
>>>>>> QueryExecutorImpl)
>>>>>> at java.lang.Object.wait(Object.java:503)
>>>>>> at org.postgresql.core.v3.QueryExecutorImpl.waitOnLock(
>>>>>> QueryExecutorImpl.java:91)
>>>>>> at org.postgresql.core.v3.QueryExecutorImpl.execute(
>>>>>> QueryExecutorImpl.java:228)
>>>>>> - locked <0x0000000744a14aa0> (a org.postgresql.core.v3.
>>>>>> QueryExecutorImpl)
>>>>>> at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(
>>>>>> AbstractJdbc2Statement.java:561)
>>>>>> at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(
>>>>>> AbstractJdbc2Statement.java:405)
>>>>>> at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(
>>>>>> AbstractJdbc2Statement.java:285)
>>>>>> at org.postgresql.jdbc4.AbstractJdbc4Connection.isValid(
>>>>>> AbstractJdbc4Connection.java:130)
>>>>>> at org.postgresql.jdbc4.Jdbc4Connection.isValid(
>>>>>> Jdbc4Connection.java:21)
>>>>>> at com.zaxxer.hikari.proxy.ConnectionProxy.isValid(
>>>>>> ConnectionProxy.java:357)
>>>>>> at xx.xxxx.actors.flux.mclu.McluProcessor$$anonfun$
>>>>>> receive$1.applyOrElse(McluProcessor.scala:65)
>>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>>> at xx.xxxx.actors.flux.mclu.McluProcessor.aroundReceive(
>>>>>> McluProcessor.scala:30)
>>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>> ThreadPoolExecutor.java:1145)
>>>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>> ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:744)
>>>>>> Locked ownable synchronizers:
>>>>>> - <0x000000074482e6d0> (a java.util.concurrent.
>>>>>> ThreadPoolExecutor$Worker)
>>>>>
>>>>>
>>>>> We've first dropped by the Akka group, thinking it was a problem
>>>>> with our code, but nothing came up. HikariCP's main developper showed up
>>>>> with an explanation though :
>>>>>
>>>>> It appears that you are using the PostgreSQL CopyManager, correct?
>>>>>> Looking at QueryExecutorImpl it appears that rollback() is trying to
>>>>>> obtain a lock that was not released by the CopyManager. I recommend using
>>>>>> the CopyManager.copyIn() method that returns a CopyIn object, rather
>>>>>> than using the convenience method that takes a reader. Use the
>>>>>> writeToCopy() to pump the data in, and be sure to catch
>>>>>> SQLException. If you get an SQLException, call cancelCopy() and
>>>>>> retry or whatever your recovery scenario is, otherwise call endCopy().
>>>>>> I would have expected PostgreSQL to handle the severing of a Connection in
>>>>>> the middle of a bulk copy better, but that is probably a question for the
>>>>>> PostgreSQL group.
>>>>>
>>>>>
>>>>> Tried it, no luck. Here the link for reference :
>>>>> https://groups.google.com/forum/#!topic/akka-user/Ehzioy3jVoU
>>>>>
>>>>> Here's what we can show you of our (scala) code :
>>>>>
>>>>>
>>>>>> val cpManager = connection.unwrap(classOf[PGConnection]).getCopyAPI
>>>>>> val stringBytes: Array[Byte] = batchStrings.map(_.toByte).toArray
>>>>>> val copy = cpManager.copyIn(s"COPY temp FROM STDIN WITH CSV")
>>>>>> try {
>>>>>> copy.writeToCopy(stringBytes, 0, stringBytes.size)
>>>>>> copy.endCopy()
>>>>>> } finally {
>>>>>> if (copy.isActive) {
>>>>>> copy.cancelCopy()
>>>>>> }
>>>>>> }
>>>>>
>>>>>
>>>>> Connection used is closed further down in a finally block.
>>>>>
>>>>> We're rather sure the lock is coming from here since we have another
>>>>> exactly similar processing but with regular inserts instead of this, and no
>>>>> issues so far. Thanks for any help.
>>>>>
>>>>
>>>>
>>>
>>
>

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Kevin Grittner 2014-12-10 15:37:54 Re: Set readonly transaction per transaction
Previous Message Leonard Meyer 2014-12-10 14:09:46 Re: Infinite waitOnLock