Quick Links

Re: Add parallelism and glibc dependent only options to reindexdb

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject:	Re: Add parallelism and glibc dependent only options to reindexdb
Date:	2019-07-02 02:55:07
Message-ID:	20190702025507.GD1388@paquier.xyz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Jul 01, 2019 at 06:14:20PM +0200, Julien Rouhaud wrote:
> On Mon, Jul 1, 2019 at 3:51 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> >
> > Please don't reuse a file name as generic as "parallel.c" -- it's
> > annoying when navigating source. Maybe conn_parallel.c multiconn.c
> > connscripts.c admconnection.c ...?
>
> I could use scripts_parallel.[ch] as I've already used it in the
> #define part?

multiconn.c sounds rather good, but I have a poor ear for any kind of
naming..

>> If your server crashes or is stopped midway during the reindex, you
>> would have to start again from scratch, and it's tedious (if it's
>> possible at all) to determine which indexes were missed. I think it
>> would be useful to have a two-phase mode: in the initial phase reindexdb
>> computes the list of indexes to be reindexed and saves them into a work
>> table somewhere. In the second phase, it reads indexes from that table
>> and processes them, marking them as done in the work table. If the
>> second phase crashes or is stopped, it can be restarted and consults the
>> work table. I would keep the work table, as it provides a bit of an
>> audit trail. It may be important to be able to run even if unable to
>> create such a work table (because of the <ironic>numerous</> users that
>> DROP DATABASE postgres).
>
> Or we could create a table locally in each database, that would fix
> this problem and probably make the code simpler?
>
> It also raises some additional concerns about data expiration. I
> guess that someone could launch the tool by mistake, kill reindexdb,
> and run it again 2 months later while a lot of new objects have been
> added for instance.

This looks like fancy additions, still that's not the core of the
problem, no? If you begin to play in this area you would need more
control options, basically a "continue" mode to be able to restart a
previously failed attempt, and a "reinit" mode able to restart the
operation completely from scratch, and perhaps even a "reset" mode
which cleans up any data already present. Not really a complexity,
but this has to be maintained a database level.

>> The --glibc-dependent
>> switch seems too ad-hoc. Maybe "--exclude-rule=glibc"? That way we can
>> add other rules later. (Not "--exclude=foo" because we'll want to add
>> the possibility to ignore specific indexes by name.)
>
> That's a good point, I like the --exclude-rule switch.

Sounds kind of nice.
--
Michael

In response to

Re: Add parallelism and glibc dependent only options to reindexdb at 2019-07-01 16:14:20 from Julien Rouhaud

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2019-07-02 03:09:20	Re: Code comment change
Previous Message	Michael Paquier	2019-07-02 02:49:28	Re: Add parallelism and glibc dependent only options to reindexdb