From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] Block level parallel vacuum |
Date: | 2018-11-24 12:17:59 |
Message-ID: | CAA4eK1+imQR9PhPjG63TdANZVrLfSo5jjFdvYLsHvsO5sHGVGw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> Attached rebased version patch to the current HEAD.
>
> > Please apply this patch with the extension lock patch[1] when testing
> > as this patch can try to extend visibility map pages concurrently.
> >
>
> Because the patch leads performance degradation in the case where
> bulk-loading to a partitioned table I think that the original
> proposal, which makes group locking conflict when relation extension
> locks, is more realistic approach. So I worked on this with the simple
> patch instead of [1]. Attached three patches:
>
> * 0001 patch publishes some static functions such as
> heap_paralellscan_startblock_init so that the parallel vacuum code can
> use them.
> * 0002 patch makes the group locking conflict when relation extension locks.
> * 0003 patch add paralel option to lazy vacuum.
>
> Please review them.
>
I could see that you have put a lot of effort on this patch and still
we are not able to make much progress mainly I guess because of
relation extension lock problem. I think we can park that problem for
some time (as already we have invested quite some time on it), discuss
a bit about actual parallel vacuum patch and then come back to it. I
don't know if that is right or not. I am not sure we can make this
ready for PG12 timeframe, but I feel this patch deserves some
attention. I have started reading the main parallel vacuum patch and
below are some assorted comments.
+ <para>
+ Execute <command>VACUUM</command> in parallel by <replaceable
class="parameter">N
+ </replaceable>a background workers. Collecting garbage on table
is processed
+ in block-level parallel. For tables with indexes, parallel
vacuum assigns each
+ index to each parallel vacuum worker and all garbages on a
index are processed
+ by particular parallel vacuum worker. The maximum nunber of
parallel workers
+ is <xref linkend="guc-max-parallel-workers-maintenance"/>. This
option can not
+ use with <literal>FULL</literal> option.
+ </para>
There are a couple of mistakes in above para:
(a) "..a background workers." a seems redundant.
(b) "Collecting garbage on table is processed in block-level
parallel."/"Collecting garbage on table is processed at block-level in
parallel."
(c) "For tables with indexes, parallel vacuum assigns each index to
each parallel vacuum worker and all garbages on a index are processed
by particular parallel vacuum worker."
We can rephrase it as:
"For tables with indexes, parallel vacuum assigns a worker to each
index and all garbages on a index are processed by particular that
parallel vacuum worker."
(d) Typo: nunber/number
(e) Typo: can not/cannot
I have glanced part of the patch, but didn't find any README or doc
containing the design of this patch. I think without having design in
place, it is difficult to review a patch of this size and complexity.
To start with at least explain how the work is distributed among
workers, say there are two workers which needs to vacuum a table with
four indexes, how it works? How does the leader participate and
coordinate the work. The other parts that you can explain how the
state is maintained during parallel vacuum, something like you are
trying to do in below function:
+ * lazy_prepare_next_state
+ *
+ * Before enter the next state prepare the next state. In parallel lazy vacuum,
+ * we must wait for the all vacuum workers to finish the previous state before
+ * preparation. Also, after prepared we change the state ot all vacuum workers
+ * and wake up them.
+ */
+static void
+lazy_prepare_next_state(LVState *lvstate, LVLeader *lvleader, int next_state)
Still other things are how the stats are shared among leader and
worker. I can understand few things in bits and pieces while glancing
through the patch, but it would be easier to understand if you
document it at one place. It can help reviewers to understand it.
Can you consider to split the patch so that the refactoring you have
done in current code to make it usable by parallel vacuum is a
separate patch?
+/*
+ * Vacuum all indexes. In parallel vacuum, each workers take indexes
+ * one by one. Also after vacuumed index they mark it as done. This marking
+ * is necessary to guarantee that all indexes are vacuumed based on
+ * the current collected dead tuples. The leader process continues to
+ * vacuum even if any indexes is not vacuumed completely due to failure of
+ * parallel worker for whatever reason. The mark will be checked
before entering
+ * the next state.
+ */
+void
+lazy_vacuum_all_indexes(LVState *lvstate)
I didn't understand what you want to say here. Do you mean that
leader can continue collecting more dead tuple TIDs when workers are
vacuuming the index? How does it deal with the errors if any during
index vacuum?
+ * plan_lazy_vacuum_workers_index_workers
+ * Use the planner to decide how many parallel worker processes
+ * VACUUM and autovacuum should request for use
+ *
+ * tableOid is the table begin vacuumed which must not be non-tables or
+ * special system tables.
..
+ plan_lazy_vacuum_workers(Oid tableOid, int nworkers_requested)
The comment starting from tableOid is not clear. The actual function
name(plan_lazy_vacuum_workers) and name in comments
(plan_lazy_vacuum_workers_index_workers) doesn't match. Can you take
relation as input parameter instead of taking tableOid as that can
save a lot of code in this function.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Malik Rumi | 2018-11-24 13:26:58 | Re: could not connect to server, in order to operate pgAdmin/PostgreSQL |
Previous Message | Ricardo Martin Gomez | 2018-11-24 12:16:10 | Re: could not connect to server, in order to operate pgAdmin/PostgreSQL |