Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()

From: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Date: 2024-05-02 18:36:31
Message-ID: 0a994343-c552-4535-a9cf-b4caa4edc1e8@yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 02.05.2024 21:01, Alena Rybakina wrote:
> On 02.05.2024 19:52, Peter Geoghegan wrote:
>> On Sat, Apr 27, 2024 at 10:38 AM Melanie Plageman
>> <melanieplageman(at)gmail(dot)com> wrote:
>>> In 17, we don't ever get a new HTSV_Result, so if the tuple is not
>>> removed, it would be because HeapTupleSatisfiesVacuumHorizon()
>>> returned HEAPTUPLE_RECENTLY_DEAD and, if GlobalVisTestIsRemovableXid()
>>> was called, dead_after did not precede GlobalVisState->maybe_needed.
>>> This tuple, during this vacuum of the relation, would never be
>>> determined to be HEAPTUPLE_DEAD or it would have been removed.
>> That makes sense.
>>
>>>>> It will always be HEAPTUPLE_RECENTLY_DEAD in 17 and in <= 16, if
>>>>> HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD, we wouldn't call
>>>>> heap_prepare_freeze_tuple() because of the retry loop.
>>>> The retry loop exists precisely because heap_prepare_freeze_tuple()
>>>> isn't prepared to deal with HEAPTUPLE_DEAD tuples. So I agree that
>>>> that won't be allowed to happen on versions that have the retry loop
>>>> (14 - 16).
>>> So, it can't happen in back branches. Let's just address 17. Help me
>>> understand how this can happen in 17.
>> Just to be clear, I never said that it was possible in 17. If I
>> somehow implied it, then I didn't mean to.
>>
> Hi! I also investigated this issue and reproduced it using this test
> added to the isolated tests, where I added 2 tuples, deleted them and
> ran vacuum and printed the tuple_deleted and dead_tuples statistics (I
> attached test c to this email as a patch). Within400iterationsormore,
> Igotthe results:
>
> n_dead_tup|n_live_tup|n_tup_del
> ----------------+------------+------------- 0| 0| 0 (1 row)
>
> After 400 or more running cycles, I felt the differences, as shown
> earlier:
>
>  n_dead_tup|n_live_tup|n_tup_del
>  ----------+----------+---------
> -         0|         0|        0
> +         2|         0|        0
>  (1 row)
>
>
> I debugged and found that the test produces results with 0 dead tuples
> if GlobalVisTempRels.maybe_needed is less than the x_max of the tuple.
> In the code, this condition works in heap_prune_satisfies_vacuum:
>
> else if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
> {
>      res = HEAPTUPLE_DEAD;
> }
>
> But when GlobalVisTempRels.maybe_needed is equal to the x_max xid of
> the tuple, vacuum does not touch this tuple, because the
> heap_prune_satisfies_vacuum function returns the status of the
> RECENTLY_DEAD tuple.
>
> Unfortunately, I have not found any explanation why
> GlobalVisTempRels.maybe_needed does not change after 400 iterations or
> more. I'm still studying it. Perhaps this information will help you.
>
> I reproduced the problem on REL_16_STABLE.
>
I reproduced this test in the master branch as well, but used a more
complex test for it: I added 700 tuples to the table, deleted half of
the table, and then started vacuum. I expected to get only 350 live
tuples and 0 dead and deleted tuples, but after 800 iterations I got 350
dead tuples and 350 live tuples: n_dead_tup|n_live_tup|n_tup_del

 ---------------+-------------+-------------
-                0|          350|             0
+          350|          350|             0
 (1 row)

I have added other steps in the test, but so far I have not seen any
falls there or have not reached them.

Just in case, I ran the test with this bash command:

for i in `seq 2000`;do echo "ITER $i"; make -s installcheck -C
src/test/isolation/ || break;done

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
vacuum_test.spec text/x-rpm-spec 1.2 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-05-02 21:16:10 Re: BUG #18449: Altering column type fails when an SQL routine depends on the column
Previous Message Alena Rybakina 2024-05-02 18:01:14 Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()