Re: [HACKERS] Block level parallel vacuum

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Mahendra Singh <mahi6run(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-10-28 06:50:34
Message-ID: CAA4eK1Jnu3Y04_T1BegyK=xhSFpp3E6mxPNAz1TnU5pL0Rg73Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 27, 2019 at 12:52 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> >
> I haven't yet read the new set of the patch. But, I have noticed one
> thing. That we are getting the size of the statistics using the AM
> routine. But, we are copying those statistics from local memory to
> the shared memory directly using the memcpy. Wouldn't it be a good
> idea to have an AM specific routine to get it copied from the local
> memory to the shared memory? I am not sure it is worth it or not but
> my thought behind this point is that it will give AM to have local
> stats in any form ( like they can store a pointer in that ) but they
> can serialize that while copying to shared stats. And, later when
> shared stats are passed back to the Am then it can deserialize in its
> local form and use it.
>

You have a point, but after changing the gist index, we don't have any
current usage for indexes that need something like that. So, on one
side there is some value in having an API to copy the stats, but on
the other side without having clear usage of an API, it might not be
good to expose a new API for the same. I think we can expose such an
API in the future if there is a need for the same. Do you or anyone
know of any external IndexAM that has such a need?

Few minor comments while glancing through the latest patchset.

1. I think you can merge 0001*, 0002*, 0003* patch into one patch as
all three expose new variable/function from IndexAmRoutine.

2.
+prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
+{
+ char *p = (char *) GetSharedIndStats(lvshared);
+ int vac_work_mem = IsAutoVacuumWorkerProcess() &&
+ autovacuum_work_mem != -1 ?
+ autovacuum_work_mem : maintenance_work_mem;

I think this function won't be called from AutoVacuumWorkerProcess at
least not as of now, so isn't it a better idea to have an Assert for
it?

3.
+void
+heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)

This function is for performing a parallel operation on the index, so
why to start with heap? It is better to name it as
index_parallel_vacuum_main or simply parallel_vacuum_main.

4.
/* useindex = true means two-pass strategy; false means one-pass */
@@ -128,17 +280,12 @@ typedef struct LVRelStats
BlockNumber pages_removed;
double tuples_deleted;
BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
- /* List of TIDs of tuples we intend to delete */
- /* NB: this list is ordered by TID address */
- int num_dead_tuples; /* current # of entries */
- int max_dead_tuples; /* # slots allocated in array */
- ItemPointer dead_tuples; /* array of ItemPointerData */
+ LVDeadTuples *dead_tuples;
int num_index_scans;
TransactionId latestRemovedXid;
bool lock_waiter_detected;
} LVRelStats;

-
/* A few variables that don't seem worth passing around as parameters */
static int elevel = -1;

It seems like a spurious line removal.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-10-28 07:14:41 Re: v12.0: interrupt reindex CONCURRENTLY: ccold: ERROR: could not find tuple for parent of relation ...
Previous Message Soumyadeep Chakraborty 2019-10-28 06:46:22 Re: WIP: expression evaluation improvements