| From: | "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com> | 
|---|---|
| To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> | 
| Cc: | vignesh C <vignesh21(at)gmail(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> | 
| Subject: | Re: Add two missing tests in 035_standby_logical_decoding.pl | 
| Date: | 2023-05-05 10:32:33 | 
| Message-ID: | 2c544b9f-419d-d18e-47be-f3d6f3d24e4e@gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 5/5/23 11:29 AM, Amit Kapila wrote:
> On Fri, May 5, 2023 at 1:16 PM Drouvot, Bertrand
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>>
>>
>> After multiple attempts, I got one failing one.
>>
>> Issue is that we expect this file to be removed:
>>
>> [07:24:27.261](0.899s) #WAL file is /Users/admin/pgsql/build/testrun/recovery/035_standby_logical_decoding/data/t_035_standby_logical_decoding_standby_data/pgdata/pg_wal/000000010000000000000003
>>
>> But the standby emits:
>>
>> 2023-05-05 07:24:27.216 UTC [17909][client backend] [035_standby_logical_decoding.pl][3/6:0] LOG:  statement: checkpoint;
>> 2023-05-05 07:24:27.216 UTC [17745][checkpointer] LOG:  restartpoint starting: immediate wait
>> 2023-05-05 07:24:27.259 UTC [17745][checkpointer] LOG:  attempting to remove WAL segments older than log file 000000000000000000000002
>>
>> So it seems the test is not right (missing activity??), not sure why yet.
>>
> 
> Can you try to print the value returned by
> XLogGetReplicationSlotMinimumLSN() in KeepLogSeg() on standby? Also,
> please try to print "attempting to remove WAL segments ..." on the
> primary. We can see, if by any chance some slot is holding us to
> remove the required WAL file.
> 
I turned DEBUG2 on. We can also see on the primary:
2023-05-05 08:23:30.843 UTC [16833][checkpointer] LOCATION:  CheckPointReplicationSlots, slot.c:1576
2023-05-05 08:23:30.844 UTC [16833][checkpointer] DEBUG:  00000: snapshot of 0+0 running transaction ids (lsn 0/40000D0 oldest xid 746 latest complete 745 next xid 746)
2023-05-05 08:23:30.844 UTC [16833][checkpointer] LOCATION:  LogCurrentRunningXacts, standby.c:1377
2023-05-05 08:23:30.845 UTC [16833][checkpointer] LOG:  00000: BDT1 about to call RemoveOldXlogFiles in CreateCheckPoint
2023-05-05 08:23:30.845 UTC [16833][checkpointer] LOCATION:  CreateCheckPoint, xlog.c:6835
2023-05-05 08:23:30.845 UTC [16833][checkpointer] LOG:  00000: attempting to remove WAL segments older than log file 000000000000000000000002
2023-05-05 08:23:30.845 UTC [16833][checkpointer] LOCATION:  RemoveOldXlogFiles, xlog.c:3560
2023-05-05 08:23:30.845 UTC [16833][checkpointer] DEBUG:  00000: recycled write-ahead log file "000000010000000000000001"
2023-05-05 08:23:30.845 UTC [16833][checkpointer] LOCATION:  RemoveXlogFile, xlog.c:3708
2023-05-05 08:23:30.845 UTC [16833][checkpointer] DEBUG:  00000: recycled write-ahead log file "000000010000000000000002"
2023-05-05 08:23:30.845 UTC [16833][checkpointer] LOCATION:  RemoveXlogFile, xlog.c:3708
2023-05-05 08:23:30.845 UTC [16833][checkpointer] DEBUG:  00000: SlruScanDirectory invoking callback on pg_subtrans/0000
So, 000000010000000000000003 is not removed on the primary.
It has been recycled on:
2023-05-05 08:23:38.605 UTC [16833][checkpointer] DEBUG: 00000: recycled write-ahead log file "000000010000000000000003"
Which is later than the test:
[08:23:31.931](0.000s) not ok 19 - invalidated logical slots do not lead to retaining WAL
FWIW, the failing test with DEBUG2 can be found there: https://cirrus-ci.com/task/5615316688961536
Regards,
-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2023-05-05 10:34:18 | Re: Tables getting stuck at 's' state during logical replication | 
| Previous Message | Peter Smith | 2023-05-05 10:09:33 | Re: Support logical replication of DDLs |