replication slot stats memory bug

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: replication slot stats memory bug
Date: 2021-03-17 23:04:47
Message-ID: 20210317230447.c7uc4g3vbs4wi32i@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

in the course of https://postgr.es/m/3471359.1615937770%40sss.pgh.pa.us
I saw a leak in pgstat_read_statsfiles(), more precisely:
/* Allocate the space for replication slot statistics */
replSlotStats = palloc0(max_replication_slots * sizeof(PgStat_ReplSlotStats));

the issue is that the current memory context is not set by
pgstat_read_statsfiles().

In some cases CurrentMemoryContext is going to be a long-lived context,
accumulating those allocations over time. In other contexts it will be a
too short lived context, e.g. an ExprContext from the pg_stat_*
invocation in the query. A reproducer for the latter:

postgres[2252294][1]=# SELECT pg_create_logical_replication_slot('test', 'test_decoding');
┌────────────────────────────────────┐
│ pg_create_logical_replication_slot │
├────────────────────────────────────┤
│ (test,0/456C1878) │
└────────────────────────────────────┘
(1 row)

postgres[2252294][1]=# BEGIN ;
BEGIN

postgres[2252294][1]*=# SELECT * FROM pg_stat_replication_slots ;
┌───────────┬────────────┬─────────────┬─────────────┬─────────────┬──────────────┬──────────────┬─────────────┐
│ slot_name │ spill_txns │ spill_count │ spill_bytes │ stream_txns │ stream_count │ stream_bytes │ stats_reset │
├───────────┼────────────┼─────────────┼─────────────┼─────────────┼──────────────┼──────────────┼─────────────┤
│ test │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ (null) │
└───────────┴────────────┴─────────────┴─────────────┴─────────────┴──────────────┴──────────────┴─────────────┘
(1 row)

postgres[2252294][1]*=# SELECT * FROM pg_stat_replication_slots ;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────>
│ >
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────>
│ \x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F>
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────>
(1 row)

I'll push the minimal fix of forcing the allocation to happen in
pgStatLocalContext and setting it to NULL in pgstat_clear_snapshot().

But it seems like we just shouldn't allocate it dynamically at all?
max_replication_slots doesn't change during postmaster lifetime, so it
seems like it should just be allocated once?

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-03-17 23:25:19 Re: replication slot stats memory bug
Previous Message Tomas Vondra 2021-03-17 23:00:04 Re: WIP: WAL prefetch (another approach)