Re: REINDEX filtering in the backend

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: REINDEX filtering in the backend
Date: 2019-08-28 08:22:07
Message-ID: CAOBaU_acbmzc+=vMvGy0ufNsz+Py60-htyR7s5iw6DO2w-G+5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 28, 2019 at 7:03 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Thu, Jul 11, 2019 at 11:14:20PM +0200, Julien Rouhaud wrote:
> > I didn't want to spend too much time enjoying bison and adding new
> > unreserved keywords, so for now I just implemented this syntax to
> > start a discussion for this feature in the next commitfest:
> >
> > REINDEX ( FILTER = COLLATION ) [...]
> >
> > The FILTER clause can be used multiple times, each one is OR-ed with
> > the ReindexStmt's option, so we could easily add a LIBC, ICU and other
> > filters, also making COLLATION (or more realistically a better new
> > keyword) an alias for (LIBC | ICU) for instance.
>
> I would prefer keeping the interface simple with only COLLATION, so as
> only collation sensitive indexes should be updated, including icu and
> libc ones. Or would it be useful to have the filtering for both as
> libicu can break things similarly to glibc in an update still a
> breakage on one or the other would not happen at the same time? I
> don't know enough of libicu regarding that, eager to learn. In which
> case, wouldn't it be better to support that from the start?

I'm not sure either. Another thing would be to add extra syntax to be
able to discard even more indexes. For instance we could store the
version of the underlying lib used when the index is (re)created, and
do something like
REINDEX (FILTER = LIBC!=2.28) or REINDEX (FILTER = LIBC==2.27) or
something similar.

> > The filtering is done at table level (with and without the
> > concurrently option), so SCHEMA, DATABASE and SYSTEM automatically
> > benefit from it. If this clause is used with a REINDEX INDEX, the
> > statement errors out, as I don't see a valid use case for providing a
> > single index name and asking to possibly filter it at the same time.
>
> Supporting that case would not be that much amount of work, no?

Probably not, but I'm dubious about the use case. Adding the lib
version in the catalog would be more useful for people who want to
write their own rules to reindex specific set of indexes.

> > I also added some minimal documentation and regression tests. I'll
> > add this patch to the next commitfest.
> >
> > [1] https://www.postgresql.org/message-id/7140716c-679e-a0b9-a273-b201329d8891%402ndquadrant.com
>
> + if ((stmt->options & REINDEXOPT_ALL_FILTERS) != 0)
> + elog(ERROR, "FILTER clause is not compatible with REINDEX INDEX");
> [...]
> + discard indexes whose ordering does not depend on a collation. Note that
> + the FILTER option is not compatible with <command>REINDEX
> + SCHEMA</command>.
>
> Why do you have both limitations?

That's actually a typo, the documentation should have specified that
it's not compatible with REINDEX INDEX, not REINDEX SCHEMA, I'll fix.

> I think that it would be nice to be
> able to do both, generating an error for REINDEX INDEX if the index
> specified is not compatible, and a LOG if the index is not filtered
> out when a list is processed.

Do you mean having an error if the index does not contain any
collation based type? Also, REINDEX only accept a single name, so
there shouldn't be any list processing for REINDEX INDEX? I'm not
really in favour of adding extra code the filtering when user asks for
a specific index name to be reindexed.

> Please note that elog() cannot be used
> for user-facing failures, only for internal ones.

Indeed, I'll change with an ereport and ERRCODE_FEATURE_NOT_SUPPORTED.

>
> +REINDEX (VERBOSE, FILTER = COLLATION) TABLE reindex_verbose;
> +-- One column, not depending on a collation
> In order to make sure that a reindex has been done for a given entry
> with the filtering, an idea is to save the relfilenode before the
> REINDEX and check it afterwards. That would be nice to make sure that
> only the wanted indexes are processed, but it is not possible to be
> sure of that based on your tests, and some tests should be done on
> relations which have collation-sensitive as well as
> collation-insensitive indexes.

That's what I did when I first submitted the feature in reindexdb. I
didn't use them because it means switching to TAP tests. I can drop
the simple regression test (especially since I now realize than one is
quite broken) and use the TAP one if, or should I keep both?

>
> + index = index_open(indexOid, AccessShareLock);
> + numAtts = index->rd_index->indnatts;
> + index_close(index, AccessShareLock);
> Wouldn't it be better to close that after doing the scan?

Yes indeed.

> Could you add tests with REINDEX CONCURRENTLY?

Sure!

> Bonus: support for reindexdb should be added. Let's not forget about
> it.

Yep. That was a first prototype to see if this approach is ok. I'll
add more tests, run pgindent and reindexdb support if this approach is
sensible.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2019-08-28 08:23:47 Re: pgbench - implement strict TPC-B benchmark
Previous Message Kyotaro Horiguchi 2019-08-28 08:17:40 Re: Performance improvement of WAL writing?