Re: [HACKERS] Block level parallel vacuum

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Sergei Kornilov <sk(at)zsrv(dot)org>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Mahendra Singh <mahi6run(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <langote_amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-12-02 19:26:13
Message-ID: CA+fd4k48uhavyuYmLj7FMz8X+i8BXAVKWmetekObvssLOvB9QQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 1 Dec 2019 at 18:31, Sergei Kornilov <sk(at)zsrv(dot)org> wrote:
>
> Hi
>
> > I think I got your point. Your proposal is that it's more efficient if
> > we make the leader process vacuum the index that can be processed only
> > the leader process (i.e. indexes not supporting parallel index vacuum)
> > while workers are processing indexes supporting parallel index vacuum,
> > right? That way, we can process indexes in parallel as much as
> > possible.
>
> Right
>
> > So maybe we can call vacuum_or_cleanup_skipped_indexes first
> > and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that
> > there are parallel-safe remaining indexes after the leader finished
> > vacuum_or_cleanup_indexes_worker, as described on your proposal.
>
> I meant that after processing missing indexes (not supporting parallel index vacuum), the leader can start processing indexes that support the parallel index vacuum, along with parallel workers.
> Exactly call vacuum_or_cleanup_skipped_indexes after start parallel workers but before vacuum_or_cleanup_indexes_worker or something with similar effect.
> If we have 0 missed indexes - parallel vacuum will run as in current implementation, with leader participation.

I think your idea might not work well in some cases. That is, I think
there are some cases where it's better if leader participates to
parallel vacuum as a worker as soon as possible especially if a table
has many indexes that designedly don't support parallel vacuum (e.g.
bulkdelete of brin and using VACUUM_OPTION_PARALLEL_COND_CLEANUP).
Suppose the table has both 3 indexes that support parallel vacuum and
takes time 5 sec, 10 sec and 10 sec to vacuum respectively and 3
indexes that don't support and takes 2 sec for each. In current patch
we launch 2 workers. Then they take two indexes to vacuum and will
take 5 sec and 10 sec. At the same time the leader processes 3 indexes
that don't support parallel index and takes 6 sec. Therefore after the
worker finishes its index it takes the next index and takes 10 sec
more. The total execution time will be 15 sec. On the other hand, if
the leader participated to parallel vacuum first the total execution
time can be 11 sec (taking 5 sec and 2 sec * 3).

It's just an example, I'm not saying your idea is bad. ISTM the idea
is good on an assumption that all indexes take the same time or take a
long time so I'd also like to consider if this is true even in
production and which approaches is better if we don't have such
assumption.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2019-12-02 19:35:40 Re: Binary support for pgoutput plugin
Previous Message Tomas Vondra 2019-12-02 19:22:09 Re: surprisingly expensive join planning query