Quick Links

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
Date:	2018-02-05 17:55:46
Message-ID:	CAGTBQpZ6AMniw3W_UVt2nz+S9GPFM0vdnq-s55NPfv9BEpO1RA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Feb 5, 2018 at 1:53 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Fri, Feb 2, 2018 at 11:13 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>> After autovacuum gets cancelled, the next time it wakes up it will
>> retry vacuuming the cancelled relation. That's because a cancelled
>> autovacuum doesn't update the last-vacuumed stats.
>>
>> So the timing between an autovacuum work item and the next retry for
>> that relation is more or less an autovacuum nap time, except perhaps
>> in the case where many vacuums get cancelled, and they have to be
>> queued.
>
> I think that's not true if there are multiple databases.

I'd have to re-check.

The loop is basically, IIRC:

while(1) { vacuum db ; work items ; nap }

Now, if that's vacuum one db, not all, and if the decision on the
second run doesn't pick the same db because that big table failed to
be vacuumed, then you're right.

In that case we could add the FSM vacuum as a work item *in addition*
to what this patch does. If the work item queue is full and the FSM
vacuum doesn't happen, it'd be no worse than with the patch as-is.

Is that what you suggest?

With that in mind, I'm noticing WorkItems have a avw_database that
isn't checked by do_autovacuum. Is that right? Shouldn't only work
items that belong to the database being autovacuumed be processed?

>> That's why an initial FSM vacuum makes sense. It has a similar timing
>> to the autovacuum work item, it has the benefit that it can be
>> triggered manually (manual vacuum), and it's cheap and efficient.
>>
>>> Also the patch always vacuums fsm at beginning of the vacuum with a
>>> threshold but it is useless if the table has been properly vacuumed. I
>>> don't think it's good idea to add an additional step that "might be"
>>> efficient, because current vacuum is already heavy.
>>
>> FSMs are several orders of magnitude smaller than the tables
>> themselves. A TB-sized table I have here has a 95M FSM. If you add
>> threshold skipping, that initial FSM vacuum *will* be efficient.
>>
>> By definition, the FSM will be less than 1/4000th of the table, so
>> that initial FSM pass takes less than 1/4000th of the whole vacuum.
>> Considerably less considering the simplicity of the task.
>
> I agree the fsm is very smaller than heap and vacuum on fsm will not
> be comparatively heavy but I'm afraid that the vacuum will get more
> heavy in the future if we pile up such improvement that are small but
> might not be efficient. For example, a feature for reporting the last
> vacuum status has been proposed[1]. I wonder if we can use it to
> determine whether we do the fsm vacuum at beginning of vacuum.

Yes, such a feature would allow skipping that initial FSM vacuum. That
can be improved in a future patch if that proposed feature gets
merged. This patch can be treated independently from that in the
meanwhile, don't you think?

In response to

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently at 2018-02-05 04:53:10 from Masahiko Sawada

Responses

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently at 2018-02-05 17:58:00 from Claudio Freire
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently at 2018-02-06 07:56:53 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Claudio Freire	2018-02-05 17:58:00	Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
Previous Message	Robert Haas	2018-02-05 17:43:44	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)