Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()

From: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Date: 2024-05-02 18:01:14
Message-ID: d1ca3a1d-7ead-41a7-bfd0-5b66ad97b1cd@yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 02.05.2024 19:52, Peter Geoghegan wrote:
> On Sat, Apr 27, 2024 at 10:38 AM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
>> In 17, we don't ever get a new HTSV_Result, so if the tuple is not
>> removed, it would be because HeapTupleSatisfiesVacuumHorizon()
>> returned HEAPTUPLE_RECENTLY_DEAD and, if GlobalVisTestIsRemovableXid()
>> was called, dead_after did not precede GlobalVisState->maybe_needed.
>> This tuple, during this vacuum of the relation, would never be
>> determined to be HEAPTUPLE_DEAD or it would have been removed.
> That makes sense.
>
>>>> It will always be HEAPTUPLE_RECENTLY_DEAD in 17 and in <= 16, if
>>>> HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD, we wouldn't call
>>>> heap_prepare_freeze_tuple() because of the retry loop.
>>> The retry loop exists precisely because heap_prepare_freeze_tuple()
>>> isn't prepared to deal with HEAPTUPLE_DEAD tuples. So I agree that
>>> that won't be allowed to happen on versions that have the retry loop
>>> (14 - 16).
>> So, it can't happen in back branches. Let's just address 17. Help me
>> understand how this can happen in 17.
> Just to be clear, I never said that it was possible in 17. If I
> somehow implied it, then I didn't mean to.
>
Hi! I also investigated this issue and reproduced it using this test
added to the isolated tests, where I added 2 tuples, deleted them and
ran vacuum and printed the tuple_deleted and dead_tuples statistics (I
attached test c to this email as a patch). Within400iterationsormore,
Igotthe results:

n_dead_tup|n_live_tup|n_tup_del
----------------+------------+------------- 0| 0| 0 (1 row)

After 400 or more running cycles, I felt the differences, as shown earlier:

 n_dead_tup|n_live_tup|n_tup_del
 ----------+----------+---------
-         0|         0|        0
+         2|         0|        0
 (1 row)

I debugged and found that the test produces results with 0 dead tuples
if GlobalVisTempRels.maybe_needed is less than the x_max of the tuple.
In the code, this condition works in heap_prune_satisfies_vacuum:

else if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
{
     res = HEAPTUPLE_DEAD;
}

But when GlobalVisTempRels.maybe_needed is equal to the x_max xid of the
tuple, vacuum does not touch this tuple, because the
heap_prune_satisfies_vacuum function returns the status of the
RECENTLY_DEAD tuple.

Unfortunately, I have not found any explanation why
GlobalVisTempRels.maybe_needed does not change after 400 iterations or
more. I'm still studying it. Perhaps this information will help you.

I reproduced the problem on REL_16_STABLE.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-vacuum_test.patch text/x-patch 5.6 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alena Rybakina 2024-05-02 18:36:31 Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Previous Message Haifang Wang (Centific Technologies Inc) 2024-05-02 17:26:50 Windows Application Issues | PostgreSQL | REF # 48475607