Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()

From: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Date: 2024-05-04 11:53:10
Message-ID: fee6fd69-5162-4f0b-82a0-a56864e49a96@yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi!
On 03.05.2024 12:35, Matthias van de Meent wrote:
> On Thu, 2 May 2024 at 20:36, Alena Rybakina<lena(dot)ribackina(at)yandex(dot)ru> wrote:
>> Hi! I also investigated this issue and reproduced it using this test added to the isolated tests, where I added 2 tuples, deleted them and ran vacuum and printed the tuple_deleted and dead_tuples statistics (I attached test c to this email as a patch). Within 400 iterations or more, I got the results:
>>
>> n_dead_tup|n_live_tup|n_tup_del ----------------+------------+------------- 0| 0| 0 (1 row)
>>
>> After 400 or more running cycles, I felt the differences, as shown earlier:
>>
>> n_dead_tup|n_live_tup|n_tup_del
>> ----------+----------+---------
>> - 0| 0| 0
>> + 2| 0| 0
>> (1 row)
>>
>>
>> I debugged and found that the test produces results with 0 dead tuples if GlobalVisTempRels.maybe_needed is less than the x_max of the tuple. In the code, this condition works in heap_prune_satisfies_vacuum:
>>
>> else if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
>> {
>> res = HEAPTUPLE_DEAD;
>> }
>>
>> But when GlobalVisTempRels.maybe_needed is equal to the x_max xid of the tuple, vacuum does not touch this tuple, because the heap_prune_satisfies_vacuum function returns the status of the RECENTLY_DEAD tuple.
>>
>> Unfortunately, I have not found any explanation why GlobalVisTempRels.maybe_needed does not change after 400 iterations or more. I'm still studying it. Perhaps this information will help you.
> You should probably check GlobalVisTestShouldUpdate to see why it
> wouldn't get updated. IIRC, GobalVis* are only updated every so often
> based on a local copy of a shared variable, to reduce contention on
> the shared state in the proc arrays. It is quite likely that your
> backend was waiting for new other changes, when the current
> GlobalVisTempRels state was already "good enough" for the current
> visibility snapshot.

Thank you for your advice, I am checking this now. It is a new area for
me and it needs from me some more investigation.

>> I reproduced the problem on REL_16_STABLE.
>>
>> I reproduced this test in the master branch as well, but used a more complex test for it: I added 700 tuples to the table, deleted half of the table, and then started vacuum. I expected to get only 350 live tuples and 0 dead and deleted tuples, but after 800 iterations I got 350 dead tuples and 350 live tuples: n_dead_tup|n_live_tup|n_tup_del
>> ---------------+-------------
>>
>> +-------------
>> - 0| 350| 0
>> + 350| 350| 0
>> (1 row)
> I think this output is expected. Concurrent tasks (such as autovacuum
> and autoanalyze) can have their own visibility snapshots and
> transactions, which will block VACUUM from removing rows that may be
> visible to those snapshots. Did you verify there was no other
> (background) process active in the PostgreSQL test node during your
> tests?
>
I see backgrounds processes like autovacuum launcher, client backend,
logical replication launcher, checkpointer, walwriter, andbackground
writer. I checked background backends throw pg_stat_activity view and I
didn't find a special backend, which impacted on difference in the test.
Maybe I missed something?

Besides, I had a hypothesis that some backends might be so many or we
have some special backend, which interferes with doing of updating the
snapshot, so because of this, some transaction couldn't finish on time
and we got finally RECENTLY_DEAD tuple, but I didn't find any evidence.
To prove it I added the query with output all backends before vacuum and
collected its results, when the test was passed and only when the test
failed I differed last diff with others to find something new, but it
was usual. Even when the test shows the expected results (0 dead_tuples
and 0 deleted tuples) I saw the same backends, maybe some of them were
more, but I didn't find any unusual.

Maybe I should pay attention to something special?

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-05-04 18:34:40 Re: BUG #18456: Trigger data in plpython3u trigger-function changes in AFTER UPDATE OR INSERT trigger
Previous Message Alexander Lakhin 2024-05-03 20:00:00 Re: error "can only drop stats once" brings down database