Re: BF mamba failure

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: BF mamba failure
Date: 2024-06-12 17:00:00
Message-ID: 42227456-1132-4d4e-d6ef-e096668a9a4a@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

20.03.2023 09:10, Peter Smith wrote:
>
> Using this I was also able to reproduce the problem. But test failures
> were rare. The make check-world seemed OK, and indeed the
> test_decoding tests would also appear to PASS around 14 out of 15
> times.

I've stumbled upon this assertion failure again during testing following cd312adc5.

This time I've simplified the reproducer to the attached modification.
With this patch applied, `make -s check -C contrib/test_decoding` fails on master as below:
ok 1         - pgstat_rc_1                                14 ms
not ok 2     - pgstat_rc_2                              1351 ms

contrib/test_decoding/output_iso/log/postmaster.log contains:
TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line: 562,
PID: 1130928

With extra logging added, I see the following events happening:
1) pgstat_rc_1.setup calls pgstat_create_replslot(), gets
  ReplicationSlotIndex(slot) = 0 and calls
  pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid, 0, 0).

2) pgstat_rc_1.s0_get_changes executes pg_logical_slot_get_changes(...)
  and then calls pgstat_gc_entry_refs on shmem_exit() ->
  pgstat_shutdown_hook() ...;
  with the sleep added inside pgstat_release_entry_ref, this backend waits
  after decreasing entry_ref->shared_entry->refcount to 0.

3) pgstat_rc_1.stop removes the replication slot.

4) pgstat_rc_2.setup calls pgstat_create_replslot(), gets
  ReplicationSlotIndex(slot) = 0 and calls
  pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid, 0, 0),
  which leads to the call pgstat_reinit_entry(), which increases refcount
  for the same shared_entry as in (1) and (2), and then to the call
  pgstat_acquire_entry_ref(), which increases refcount once more.

5) the backend 2 reaches
Assert(pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0),
  which fails due to refcount = 2.

Best regards,
Alexander

Attachment Content-Type Size
pgstat_rc-debugging.patch text/x-patch 3.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2024-06-12 17:02:22 Re: Remove dependence on integer wrapping
Previous Message Sushrut Shivaswamy 2024-06-12 16:56:30 Columnar format export in Postgres