Re: postgresql Logical Replication Stream fails with “Database connection failed when reading from copy”

From: Jammie <shailesh(dot)jamloki(at)gmail(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: postgresql Logical Replication Stream fails with “Database connection failed when reading from copy”
Date: 2018-09-05 11:35:36
Message-ID: CAFt1pcqfA6NFjgSUamqksc=H-sg2oUDJp4ox=x=Po_NwKJFTvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-jdbc

Thanks for the invite Dave.
I have joined the room.

Regards
Shailesh

On Wed, Sep 5, 2018 at 4:20 PM Dave Cramer <pg(at)fastcrypt(dot)com> wrote:

> Shailesh,
>
> Can you join https://gitter.im/debezium/dev to discuss this ?
>
> Dave Cramer
>
> davec(at)postgresintl(dot)com
> www.postgresintl.com
>
>
> On Wed, 5 Sep 2018 at 01:35, Jammie <shailesh(dot)jamloki(at)gmail(dot)com> wrote:
>
>> Dave,
>> I am using Windows 10 and I do not have tcp_keep* settings set in my
>> regedit
>> location :
>> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
>>
>> So all these settings should be windows default values.
>>
>> Regards
>> Shailesh
>> On Wed, 5 Sep 2018, 00:14 Dave Cramer, <davec(at)postgresintl(dot)com> wrote:
>>
>>> What O/S are you using ? Can you see what the kernel tcp_keep* settings
>>> are ?
>>>
>>> On Tue, 4 Sep 2018 at 13:15, Jammie <shailesh(dot)jamloki(at)gmail(dot)com> wrote:
>>>
>>>> PGProperty.TCP_KEEP_ALIVE.set(connectionProperties, "true");
>>>>
>>>> this is how I am setting.
>>>>
>>>> On Tue, Sep 4, 2018 at 10:44 PM Dave Cramer <davec(at)postgresintl(dot)com>
>>>> wrote:
>>>>
>>>>> Are you specifying *tcpKeepAlive=true *in the connection string ?
>>>>>
>>>>> On Tue, 4 Sep 2018 at 12:27, Jammie <shailesh(dot)jamloki(at)gmail(dot)com>
>>>>> wrote:
>>>>>
>>>>>> The issue is still coming for me.
>>>>>>
>>>>>> One more thing to note here is my postgres database and Java program
>>>>>> that stream data is on the same box.
>>>>>>
>>>>>> Regards
>>>>>> Shailesh
>>>>>>
>>>>>> On Tue, Sep 4, 2018 at 9:26 PM Jammie <shailesh(dot)jamloki(at)gmail(dot)com>
>>>>>> wrote:
>>>>>>
>>>>>>> Ok, I'll check it out.
>>>>>>>
>>>>>>> On Tue, 4 Sep 2018, 20:59 Dave Cramer, <davec(at)postgresintl(dot)com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Have a look at https://issues.jboss.org/browse/DBZ-218
>>>>>>>>
>>>>>>>> I think just setting "tcpKeepAlive=true" should work. Can you try
>>>>>>>> that and get back to me?
>>>>>>>>
>>>>>>>> On Tue, 4 Sep 2018 at 10:53, Jammie <shailesh(dot)jamloki(at)gmail(dot)com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> One query is ,
>>>>>>>>> In my program I do not set
>>>>>>>>>
>>>>>>>>> stream.setAppliedPosition() or stream.setFlushLSN().
>>>>>>>>>
>>>>>>>>> So does that mean we will not be able to send the updates to
>>>>>>>>> server. and It is necessary to send this message periodically ?
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Shailesh
>>>>>>>>>
>>>>>>>>> On Tue, Sep 4, 2018 at 8:05 PM Jammie <shailesh(dot)jamloki(at)gmail(dot)com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Ok. Thank you very much. !! Appreciate your time here .
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 4, 2018 at 8:03 PM Dave Cramer <
>>>>>>>>>> davec(at)postgresintl(dot)com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, 4 Sep 2018 at 10:32, Jammie <shailesh(dot)jamloki(at)gmail(dot)com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Most of the time I have set
>>>>>>>>>>>>
>>>>>>>>>>>> .withStatusInterval(20, TimeUnit.SECONDS)
>>>>>>>>>>>>
>>>>>>>>>>>> Just that to see If 0 can solve this issue I tried it once.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> No, that is worse.
>>>>>>>>>>>
>>>>>>>>>>> I'll build a jar for you to try.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Sep 4, 2018 at 7:59 PM Jammie <
>>>>>>>>>>>> shailesh(dot)jamloki(at)gmail(dot)com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry I have not done building of driver myself. Please help
>>>>>>>>>>>>> If you can do that for me or If you can share steps I can do that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also would like to understand If we got the root cause fixed :
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is my further observation for this issue :
>>>>>>>>>>>>> 1) The issue comes at different point in time while stream is
>>>>>>>>>>>>> doing its job e.g. writing copy/reading copy/ending copy. I think
>>>>>>>>>>>>> underlying issue is Socket issue only.
>>>>>>>>>>>>> 2) Once this issue comes the stream object cant even be
>>>>>>>>>>>>> closed. even stream.close() throws the same error e.g. 'error reading
>>>>>>>>>>>>> copy`. Even SQL Connection object is unusable e.g. querying to database
>>>>>>>>>>>>> using that object hangs. However I see both stream.isClose() and
>>>>>>>>>>>>> connection.isClose() is false.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> Shailesh
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Sep 4, 2018 at 7:52 PM Dave Cramer <
>>>>>>>>>>>>> davec(at)postgresintl(dot)com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So here's what I think is happening. The keep alive timeout
>>>>>>>>>>>>>> exactly the same as the option in the slot.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So in
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> private boolean isTimeUpdate() {
>>>>>>>>>>>>>> /* a value of 0 disables automatic updates */
>>>>>>>>>>>>>> if ( updateInterval == 0 ) {
>>>>>>>>>>>>>> return false;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> long diff = System.currentTimeMillis() - lastStatusUpdate;
>>>>>>>>>>>>>> return diff >= updateInterval;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> change this to be
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> return diff >= updateInterval-100;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> or something similar to get it to update before it times out
>>>>>>>>>>>>>> on the server
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you build the driver ? If not I'll send you a driver with
>>>>>>>>>>>>>> some mods to change
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, 4 Sep 2018 at 08:40, Dave Cramer <
>>>>>>>>>>>>>> davec(at)postgresintl(dot)com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, 3 Sep 2018 at 21:03, Jammie <
>>>>>>>>>>>>>>> shailesh(dot)jamloki(at)gmail(dot)com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Dave,
>>>>>>>>>>>>>>>> thanks for the response !!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) I have tried latest driver 42.2.5 as well the same
>>>>>>>>>>>>>>>> exception can be reproduced easily.
>>>>>>>>>>>>>>>> 2) I have tried postgres V10 version along with 9.6.4 same
>>>>>>>>>>>>>>>> issue could be reproduced.
>>>>>>>>>>>>>>>> The server side log says :
>>>>>>>>>>>>>>>> *2018-09-02 23:40:52 EDT LOG: could not receive data from
>>>>>>>>>>>>>>>> client: An existing connection was forcibly closed by the remote host.*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In one of the instance I had enabled JDBC trace and this is
>>>>>>>>>>>>>>>> what the last few lines are :
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Sep 03, 2018 3:37:52 AM org.postgresql.core.v3.*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *QueryExecutorImpl receiveRFQFINEST: <=BE
>>>>>>>>>>>>>>>> ReadyForQuery(I)Sep 03, 2018 3:37:52 AM org.postgresql.jdbc.*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *PgResultSet getIntFINEST: getInt columnIndex: 1Sep 03,
>>>>>>>>>>>>>>>> 2018 3:37:52 AM org.postgresql.jdbc.*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *PgConnection setAutoCommitFINE: setAutoCommit = falseSep
>>>>>>>>>>>>>>>> 03, 2018 3:37:53 AM org.postgresql.core.**QueryExecutorBase
>>>>>>>>>>>>>>>> close*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I also see Socket reset error some times :
>>>>>>>>>>>>>>>> ---------------------
>>>>>>>>>>>>>>>> *Sep 03, 2018 3:29:42 AM org.postgresql.core.v3.*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *QueryExecutorImpl processCopyResultsFINEST: <=BE
>>>>>>>>>>>>>>>> CopyDataSep 03, 2018 3:29:42 AM org.postgresql.core.v3.*
>>>>>>>>>>>>>>>> *replication.*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *V3PGReplicationStream processKeepAliveMessageFINEST:
>>>>>>>>>>>>>>>> <=BE Keepalive(lastServerWal: 0/2DD0218, clock: 9/3/18 3:29 AM needReply:
>>>>>>>>>>>>>>>> false)Sep 03, 2018 3:30:40 AM org.postgresql.core.*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *QueryExecutorBase closeFINEST: FE=> TerminateSep 03, 2018
>>>>>>>>>>>>>>>> 3:30:40 AM org.postgresql.core.*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *QueryExecutorBase closeFINEST: Discarding IOException on
>>>>>>>>>>>>>>>> close:java.net.SocketException: Connection reset by peer: socket write
>>>>>>>>>>>>>>>> error at java.net.SocketOutputStream.*
>>>>>>>>>>>>>>>> *socketWrite0(Native Method) at
>>>>>>>>>>>>>>>> java.net.SocketOutputStream.**socketWrite(*
>>>>>>>>>>>>>>>> *SocketOutputStream.java:122) at
>>>>>>>>>>>>>>>> java.net.SocketOutputStream.*
>>>>>>>>>>>>>>>> *write(SocketOutputStream.java:*
>>>>>>>>>>>>>>>> *166) at java.io.BufferedOutputStream.**flushBuffer(*
>>>>>>>>>>>>>>>> *BufferedOutputStream.java:93) at
>>>>>>>>>>>>>>>> java.io.BufferedOutputStream.**flush(BufferedOutputStream.*
>>>>>>>>>>>>>>>> *java:151) at org.postgresql.core.PGStream.*
>>>>>>>>>>>>>>>> *flush(PGStream.java:514) at org.postgresql.core.*
>>>>>>>>>>>>>>>> *QueryExecutorBase.close(*
>>>>>>>>>>>>>>>> *QueryExecutorBase.java:136) at org.postgresql.jdbc.*
>>>>>>>>>>>>>>>> *PgConnection.close(*
>>>>>>>>>>>>>>>> *PgConnection.java:659) at com.datamirror.ts.scrapers.*
>>>>>>>>>>>>>>>> *postgresqlscraper.**PostgreSQLLogicalLogReaderImpl*
>>>>>>>>>>>>>>>> *.readLog(**PostgreSQLLogicalLogReaderImpl*
>>>>>>>>>>>>>>>> *.java:208) at com.datamirror.ts.scrapers.*
>>>>>>>>>>>>>>>> *postgresqlscraper.**PostgreSQLLogReader.readLog(*
>>>>>>>>>>>>>>>> *PostgreSQLLogReader.java:174) at
>>>>>>>>>>>>>>>> com.datamirror.ts.scrapers.**cdc.LogReader.execute(*
>>>>>>>>>>>>>>>> *LogReader.java:341) at com.datamirror.ts.scrapers.*
>>>>>>>>>>>>>>>> *cdc.LogReader$**LogReaderPipelineJob.execute(*
>>>>>>>>>>>>>>>> *LogReader.java:114) at com.datamirror.ts.engine.*
>>>>>>>>>>>>>>>> *component.PipelineThread.**runThread(PipelineThread.java:*
>>>>>>>>>>>>>>>> *217) at com.datamirror.ts.util.*
>>>>>>>>>>>>>>>> *TsThread.run(TsThread.java:*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *130)Sep 03, 2018 3:30:40 AM org.postgresql.Driver
>>>>>>>>>>>>>>>> connectFINEST: FE=> Terminate*
>>>>>>>>>>>>>>>> -------------------------
>>>>>>>>>>>>>>>> 3) Other than this I have tried to set PGProperty
>>>>>>>>>>>>>>>> TCP_KEEP_ALIVE to true and SOCKET_TIMEOUT as 0.
>>>>>>>>>>>>>>>> Also SetStatusInterval of the stream also to 0 but these
>>>>>>>>>>>>>>>> did not help the purpose.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The issue is very consistent and can be reproduced easily.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>> Shailesh
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Pretty sure I know how to fix this,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Give me a few hours..
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Sep 4, 2018 at 12:51 AM Dave Cramer <
>>>>>>>>>>>>>>>> pg(at)fastcrypt(dot)com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Off the top of my head I don't have any advice. Can you
>>>>>>>>>>>>>>>>> try the latest driver though and see if it still exists.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Are there any messages on the server saying that it closed
>>>>>>>>>>>>>>>>> the connection ?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Dave Cramer
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> davec(at)postgresintl(dot)com
>>>>>>>>>>>>>>>>> www.postgresintl.com
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, 3 Sep 2018 at 06:34, Jammie <
>>>>>>>>>>>>>>>>> shailesh(dot)jamloki(at)gmail(dot)com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi I am reading from logical replication stream
>>>>>>>>>>>>>>>>>> continuously in infinite while loop. and I have another program that is
>>>>>>>>>>>>>>>>>> continuously populating a table in the same database. I notice that after
>>>>>>>>>>>>>>>>>> some time (around 5-10 minutes). I always get the exception
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> org.postgresql.util.PSQLException: Database connection failed when reading from copy
>>>>>>>>>>>>>>>>>> at org.postgresql.core.v3.QueryExecutorImpl.readFromCopy(QueryExecutorImpl.java:1035)
>>>>>>>>>>>>>>>>>> at org.postgresql.core.v3.CopyDualImpl.readFromCopy(CopyDualImpl.java:41)
>>>>>>>>>>>>>>>>>> at org.postgresql.core.v3.replication.V3PGReplicationStream.receiveNextData(V3PGReplicationStream.java:155)
>>>>>>>>>>>>>>>>>> at org.postgresql.core.v3.replication.V3PGReplicationStream.readInternal(V3PGReplicationStream.java:124)
>>>>>>>>>>>>>>>>>> at org.postgresql.core.v3.replication.V3PGReplicationStream.read(V3PGReplicationStream.java:70)
>>>>>>>>>>>>>>>>>> at com.datamirror.ts.scrapers.postgresqlscraper.PGStreamReceiver.main(PGStreamReceiver.java:60)
>>>>>>>>>>>>>>>>>> Caused by: java.net.SocketException: Connection reset
>>>>>>>>>>>>>>>>>> at java.net.SocketInputStream.read(SocketInputStream.java:220)
>>>>>>>>>>>>>>>>>> at java.net.SocketInputStream.read(SocketInputStream
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Scott Ribe 2018-09-05 13:32:04 Re: How to get alerted automatically whenever a table structure is changed between Publisher and Subscriber in Logical Replication?
Previous Message Dave Cramer 2018-09-05 10:50:42 Re: postgresql Logical Replication Stream fails with “Database connection failed when reading from copy”

Browse pgsql-jdbc by date

  From Date Subject
Next Message Gavin Flower 2018-09-05 22:14:23 Re: re-skinning the jdbc.postgresql.org
Previous Message Dave Cramer 2018-09-05 11:24:42 re-skinning the jdbc.postgresql.org