Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>
Subject: Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Date: 2024-07-26 20:27:17
Message-ID: CAAKRu_bWmMjmqL+OZ2duEQ80u7cRvpsExLNZNjzk-pXX5skwMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 22, 2024 at 9:26 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> + CREATE TABLE ${table1}(col1 int)
> + WITH (autovacuum_enabled=false, fillfactor=10);
> + INSERT INTO $table1 VALUES(7);
> + INSERT INTO $table1 SELECT generate_series(1, $nrows) % 3;
> + CREATE INDEX on ${table1}(col1);
> + UPDATE $table1 SET col1 = 3 WHERE col1 = 0;
> + INSERT INTO $table1 VALUES(7);
>
> These queries make sense to me; these make the radix tree wide and use
> more nodes, instead of fattening lead nodes (i.e. the offset bitmap).
> The $table1 has 18182 blocks and the statistics of radix tree shows:
>
> max_val = 65535
> num_keys = 18182
> height = 1, n4 = 0, n16 = 1, n32 = 0, n64 = 0, n256 = 72, leaves = 18182
>
> Which means that the height of the tree is 2 and we use the maximum
> size node for all nodes except for 1 node.

Do you have some kind of tool that prints this out for you? That would
be really handy.

> I don't have any great idea to substantially reduce the total number
> of tuples in the $table1. Probably we can use DELETE instead of UPDATE
> to make garbage tuples (although I'm not sure it's okay for this
> test). Which reduces the amount of WAL records from 11MB to 4MB and
> would reduce the time to catch up. But I'm not sure how much it would
> help. There might be ideas to trigger a two-round index vacuum with
> fewer tuples but if the tests are too optimized for the current
> TidStore, we will have to re-adjust them if the TidStore changes in
> the future. So I think it's better and reliable to allow
> maintenance_work_mem to be a lower value or use injection points
> somehow.

I think we can make improvements in overall time on master and 17 with
the examples John provided later in the thread. However, I realized
you are right about using a DELETE instead of an UPDATE. At some point
in my development, I needed the UPDATE to satisfy some other aspect of
the test. But that is no longer true. A DELETE works just as well as
an UPDATE WRT the dead items and, as you point out, much less WAL is
created and replay is much faster.

I also realized I forgot to add 043_vacuum_horizon_floor.pl to
src/test/recovery/meson.build in 16. I will post a patch here this
weekend which changes the UPDATE to a DELETE in 14-16 (sped up the
test by about 20% for me locally) and adds 043_vacuum_horizon_floor.pl
to src/test/recovery/meson.build in 16. I'll plan to push it on Monday
to save myself any weekend buildfarm embarrassment.

As for 17 and master, I'm going to try out John's examples and see if
it seems like it will be fast enough to commit to 17/master without
lowering the maintenance_work_mem lower bound.

If we want to lower it, I wonder if we just halve it -- since it seems
like the tests with half the number of tuples were fast enough to
avoid timing out on slow animals on the buildfarm? Or do we need some
more meaningful value to decrease it to?

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-07-26 20:32:12 Re: problems with "Shared Memory and Semaphores" section of docs
Previous Message Alexander Korotkov 2024-07-26 20:13:41 Re: Incremental backup from a streaming replication standby fails