Re: BUG #16760: Standby database missed records for at least 1 table

From: Andriy Bartash <abartash(at)xmatters(dot)com>
To: Euler Taveira <euler(dot)taveira(at)2ndquadrant(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16760: Standby database missed records for at least 1 table
Date: 2020-12-03 19:04:17
Message-ID: CO6PR02MB7540935BC342E198E93DEC21BBF20@CO6PR02MB7540.namprd02.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thank you Euler
Here’s what I have:

1. Yes, index was used:
nex=# explain (analyze) select audit_ev_id, when_created from audit_evs_all where audit_ev_id
nex-# between 221535154 and 221535162;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Index Scan using audit_evs_all_pk on audit_evs_all (cost=0.29..3.31 rows=1 width=16) (actual time=0.070..0.071 rows=1 loops=1)
Index Cond: ((audit_ev_id >= 221535154) AND (audit_ev_id <= 221535162))
Planning Time: 0.352 ms
Execution Time: 0.113 ms
(4 rows)

1. Disable index (but it used Bitmap index Scan anyway) and returned only 1 record.

nex=# begin;

BEGIN

nex=# set local enable_indexscan to 0;

SET

nex=# set local enable_indexonlyscan to 0;

SET

nex=# explain (analyze) select audit_ev_id, when_created from audit_evs_all where audit_ev_id

nex-# between 221535154 and 221535162;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on audit_evs_all (cost=1.80..3.32 rows=1 width=16) (actual time=0.115..0.116 rows=1 loops=1)

Recheck Cond: ((audit_ev_id >= 221535154) AND (audit_ev_id <= 221535162))

Heap Blocks: exact=1

-> Bitmap Index Scan on audit_evs_all_pk (cost=0.00..1.80 rows=1 width=0) (actual time=0.015..0.016 rows=9 loops=1)

Index Cond: ((audit_ev_id >= 221535154) AND (audit_ev_id <= 221535162))

Planning Time: 0.295 ms

Execution Time: 0.146 ms

(7 rows)

nex=# select audit_ev_id, when_created from audit_evs_all where audit_ev_id

nex-# between 221535154 and 221535162;

audit_ev_id | when_created

-------------+-------------------------------

221535162 | 2020-12-01 00:00:20.955348+00

(1 row)

nex=# commit;

1. Tried to reindex table and select again (didn’t help, same 1 record was returned)

nex=# reindex table audit_evs_all;

REINDEX

nex=# select audit_ev_id, when_created from audit_evs_all where audit_ev_id

between 221535154 and 221535162;

audit_ev_id | when_created

-------------+-------------------------------

221535162 | 2020-12-01 00:00:20.955348+00

(1 row)

1. Drop primary key constraint with its index and try again (seq scan used and 1 record returned again)

nex=# ALTER TABLE public.audit_evs_all DROP CONSTRAINT audit_evs_all_pk;

ALTER TABLE

nex=# explain (analyze) select audit_ev_id, when_created from audit_evs_all where audit_ev_id

nex-# between 221535154 and 221535162;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------

Seq Scan on audit_evs_all (cost=0.00..5311.15 rows=1 width=16) (actual time=10.680..19.632 rows=1 loops=1)

Filter: ((audit_ev_id >= 221535154) AND (audit_ev_id <= 221535162))

Rows Removed by Filter: 86276

Planning Time: 0.376 ms

Execution Time: 19.667 ms

(5 rows)

nex=# select audit_ev_id, when_created from audit_evs_all where audit_ev_id

between 221535154 and 221535162;

audit_ev_id | when_created

-------------+-------------------------------

221535162 | 2020-12-01 00:00:20.955348+00

(1 row)

1. Now I am looking into pageinspect . I will update you immediately.

Much appreciate your help.

From: Euler Taveira <euler(dot)taveira(at)2ndquadrant(dot)com>
Date: Thursday, December 3, 2020 at 10:20 AM
To: Andriy Bartash <abartash(at)xmatters(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16760: Standby database missed records for at least 1 table
On Wed, 2 Dec 2020 at 17:44, PG Bug reporting form <noreply(at)postgresql(dot)org<mailto:noreply(at)postgresql(dot)org>> wrote:
Below output from Primary
select audit_ev_id, when_created from audit_evs_all where audit_ev_id
between 221535154 and 221535162;
audit_ev_id | when_created
-------------+-------------------------------
221535154 | 2020-12-01 00:00:20.955348+00
221535155 | 2020-12-01 00:00:20.955348+00
221535156 | 2020-12-01 00:00:20.955348+00
221535157 | 2020-12-01 00:00:20.955348+00
221535158 | 2020-12-01 00:00:20.955348+00
221535159 | 2020-12-01 00:00:20.955348+00
221535160 | 2020-12-01 00:00:20.955348+00
221535161 | 2020-12-01 00:00:20.955348+00
221535162 | 2020-12-01 00:00:20.955348+00
(9 rows)
---------------------------------------------------------------
Same query's output from 1st Secondary
select audit_ev_id, when_created from audit_evs_all where audit_ev_id
between 221535154 and 221535162;
audit_ev_id | when_created
-------------+-------------------------------
221535162 | 2020-12-01 00:00:20.955348+00
(1 row)
I would start this investigation, disabling the index usage for this query.

-- check which index has been used
explain (analyze) select audit_ev_id, when_created from audit_evs_all where audit_ev_id
between 221535154 and 221535162;
-- disable index usage
begin;
set local enable_indexscan to 0;
set local enable_indexonlyscan to 0;
explain (analyze) select audit_ev_id, when_created from audit_evs_all where audit_ev_id
between 221535154 and 221535162;
select audit_ev_id, when_created from audit_evs_all where audit_ev_id
between 221535154 and 221535162;
commit;

If the second EXPLAIN isn't using the index shown in the first EXPLAIN and the
SELECT returns all rows, reindex the index that has been used. If the index is
not the culprit, you should inspect the page(s) that contain those rows using
pageinspect.

--
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2020-12-03 20:50:59 Re: BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend.
Previous Message Euler Taveira 2020-12-03 18:20:26 Re: BUG #16760: Standby database missed records for at least 1 table