Re: maintenance_work_mem = 64kB doesn't work for vacuum

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: maintenance_work_mem = 64kB doesn't work for vacuum
Date: 2025-03-10 09:53:23
Message-ID: CAApHDvps_sLPtBVZLyi--bmcjDNwqfg2eApQk9muYG-UrEi_nA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 10 Mar 2025 at 17:22, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Regarding that patch, we need to note that the lpdead_items is a
> counter that is not reset in the entire vacuum. Therefore, with
> maintenance_work_mem = 64kB, once we collect at least one lpdead item,
> we perform a cycle of index vacuuming and heap vacuuming for every
> subsequent block even if they don't have a lpdead item. I think we
> should use vacrel->dead_items_info->num_items instead.

OK, I didn't study the code enough to realise that. My patch was only
intended as an indication of what I thought. Please feel free to
proceed with your own patch using the correct field.

When playing with parallel vacuum, I also wondered if there should be
some heuristic that avoids parallel vacuum unless the user
specifically asked for it in the command when maintenance_work_mem is
set to something far too low.

Take the following case as an example:
set maintenance_work_mem=64;
create table aa(a int primary key, b int unique);
insert into aa select a,a from generate_Series(1,1000000) a;
delete from aa;

-- try a vacuum with no parallelism
vacuum (verbose, parallel 0) aa;

system usage: CPU: user: 0.53 s, system: 0.00 s, elapsed: 0.57 s

If I did the following instead:

vacuum (verbose) aa;

The vacuum goes parallel and it takes a very long time due to
launching a parallel worker to do 1 page worth of tuples. I see the
following message 4425 times

INFO: launched 1 parallel vacuum worker for index vacuuming (planned: 1)

and takes about 30 seconds to complete: system usage: CPU: user: 14.00
s, system: 0.81 s, elapsed: 30.86 s

Shouldn't the code in parallel_vacuum_compute_workers() try and pick a
good value for the workers based on the available memory and table
size when the user does not explicitly specify how many workers they
want?

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Steven Niu 2025-03-10 10:07:34 Re: [Patch] remove duplicated smgrclose
Previous Message Peter Eisentraut 2025-03-10 09:49:15 Re: 64 bit numbers vs format strings