Re: Minimal logical decoding on standbys

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2019-04-12 18:04:02
Message-ID: CAJ3gD9fj2CZOmGzZBtF8vcz0uLwaFZ9WXEyRPkmjjd0goojMxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 10 Apr 2019 at 21:39, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2019-04-10 12:11:21 +0530, tushar wrote:
> >
> > On 03/13/2019 08:40 PM, tushar wrote:
> > > Hi ,
> > >
> > > I am getting a server crash on standby while executing
> > > pg_logical_slot_get_changes function , please refer this scenario
> > >
> > > Master cluster( ./initdb -D master)
> > > set wal_level='hot_standby in master/postgresql.conf file
> > > start the server , connect to psql terminal and create a physical
> > > replication slot ( SELECT * from
> > > pg_create_physical_replication_slot('p1');)
> > >
> > > perform pg_basebackup using --slot 'p1' (./pg_basebackup -D slave/ -R
> > > --slot p1 -v))
> > > set wal_level='logical' , hot_standby_feedback=on,
> > > primary_slot_name='p1' in slave/postgresql.conf file
> > > start the server , connect to psql terminal and create a logical
> > > replication slot ( SELECT * from
> > > pg_create_logical_replication_slot('t','test_decoding');)
> > >
> > > run pgbench ( ./pgbench -i -s 10 postgres) on master and select
> > > pg_logical_slot_get_changes on Slave database
> > >
> > > postgres=# select * from pg_logical_slot_get_changes('t',null,null);
> > > 2019-03-13 20:34:50.274 IST [26817] LOG: starting logical decoding for
> > > slot "t"
> > > 2019-03-13 20:34:50.274 IST [26817] DETAIL: Streaming transactions
> > > committing after 0/6C000060, reading WAL from 0/6C000028.
> > > 2019-03-13 20:34:50.274 IST [26817] STATEMENT: select * from
> > > pg_logical_slot_get_changes('t',null,null);
> > > 2019-03-13 20:34:50.275 IST [26817] LOG: logical decoding found
> > > consistent point at 0/6C000028
> > > 2019-03-13 20:34:50.275 IST [26817] DETAIL: There are no running
> > > transactions.
> > > 2019-03-13 20:34:50.275 IST [26817] STATEMENT: select * from
> > > pg_logical_slot_get_changes('t',null,null);
> > > TRAP: FailedAssertion("!(data == tupledata + tuplelen)", File:
> > > "decode.c", Line: 977)
> > > server closed the connection unexpectedly
> > > This probably means the server terminated abnormally
> > > before or while processing the request.
> > > The connection to the server was lost. Attempting reset: 2019-03-13
> > > 20:34:50.276 IST [26809] LOG: server process (PID 26817) was terminated
> > > by signal 6: Aborted
> > >
> > Andres - Do you think - this is an issue which needs to be fixed ?
>
> Yes, it definitely needs to be fixed. I just haven't had sufficient time
> to look into it. Have you reproduced this with Amit's latest version?
>
> Amit, have you spent any time looking into it? I know that you're not
> that deeply steeped into the internals of logical decoding, but perhaps
> there's something obvious going on.

I tried to see if I can quickly understand what's going on.

Here, master wal_level is hot_standby, not logical, though slave
wal_level is logical.

On slave, when pg_logical_slot_get_changes() is run, in
DecodeMultiInsert(), it does not get any WAL records having
XLH_INSERT_CONTAINS_NEW_TUPLE set. So data pointer is never
incremented, it remains at tupledata. So at the end of the function,
this assertion fails :
Assert(data == tupledata + tuplelen);
because data is actually at tupledata.

Not sure why this is happening. On slave, wal_level is logical, so
logical records should have tuple data. Not sure what does that have
to do with wal_level of master. Everything should be there on slave
after it replays the inserts; and also slave wal_level is logical.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-04-12 18:17:11 Useless code in RelationCacheInitializePhase3
Previous Message reiner peterke 2019-04-12 18:04:00 PANIC: could not flush dirty data: Operation not permitted power8, Redhat Centos