Re: cost based vacuum (parallel)

From: Darafei "Komяpa" Praliaskouski <me(at)komzpa(dot)net>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: cost based vacuum (parallel)
Date: 2019-11-04 07:33:18
Message-ID: CAC8Q8tJXWS1BaZWtwkG8XFjab79oyOxXxNarZrAWSmmobKeE9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>
> This is somewhat similar to a memory usage problem with a
> parallel query where each worker is allowed to use up to work_mem of
> memory. We can say that the users using parallel operation can expect
> more system resources to be used as they want to get the operation
> done faster, so we are fine with this. However, I am not sure if that
> is the right thing, so we should try to come up with some solution for
> it and if the solution is too complex, then probably we can think of
> documenting such behavior.
>

In cloud environments (Amazon + gp2) there's a budget on input/output
operations. If you cross it for long time, everything starts looking like
you work with a floppy disk.

For the ease of configuration, I would need a "max_vacuum_disk_iops" that
would limit number of input-output operations by all of the vacuums in the
system. If I set it to less than value of budget refill, I can be sure than
that no vacuum runs too fast to impact any sibling query.

There's also value in non-throttled VACUUM for smaller tables. On gp2 such
things will be consumed out of surge budget, and its size is known to
sysadmin. Let's call it "max_vacuum_disk_surge_iops" - if a relation has
less blocks than this value and it's a blocking in any way situation
(antiwraparound, interactive console, ...) - go on and run without
throttling.

For how to balance the cost: if we know a number of vacuum processes that
were running in the previous second, we can just divide a slot for this
iteration by that previous number.

To correct for overshots, we can subtract the previous second's overshot
from next one's. That would also allow to account for surge budget usage
and let it refill, pausing all autovacuum after a manual one for some time.

Precision of accounting limiting count of operations more than once a
second isn't beneficial for this use case.

Please don't forget that processing one page can become several iops (read,
write, wal).

Does this make sense? :)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-11-04 07:53:18 Re: Refactor parse analysis of EXECUTE command
Previous Message Masahiko Sawada 2019-11-04 07:30:01 Re: [HACKERS] Block level parallel vacuum