Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Hamid Akhtar <hamid(dot)akhtar(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Date: 2021-01-26 17:54:26
Message-ID: CA+TgmoYS0SGveHpGoSpo=ZmHOokxVs1ipapaCH46+zJ46AGp=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 21, 2021 at 3:08 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> Re-thinking this, and after some research:
>
> Is the behaviour of "any process that invalidates TIDs in this table
> (that could be in an index on this table) always holds a lock that
> conflicts with CIC/RiC on that table" a requirement of tableams, or is
> it an implementation-detail?
>
> If it is a requirement, then this patch is a +1 for me (and that
> requirement should be documented in such case), otherwise a -1 while
> there is no mechanism in place to remove concurrently-invalidated TIDs
> from CIC-ed/RiC-ed indexes.

I don't believe that the table AM machinery presumes that every kind
of table AM necessarily has to support VACUUM; or at least, I don't
think the original version presumed that. However, if you want it to
work some other way, there's not really any infrastructure for that,
either. Like, if you want to run your own background workers to clean
up your table, you could, but you'd have to arrange that yourself. If
you want a SWEEP command or an OPTIMIZE TABLE command instead of
VACUUM, you'd have to hack the grammar. Likewise, I don't know that
there's any guarantee that CLUSTER would work on a table of any old AM
either, or that it would do anything useful.

But having said that, if you have a table AM that *does* support
VACUUM and CLUSTER, I think it would have to treat TIDs pretty much
just as we do today. Almost by definition, they have to live with the
way the rest of the system works. TIDs have to be 6 bytes, and lock
levels can't be changed, because there's nothing in table AM that
would let you tinker with that stuff. The table AM can opt out of that
machinery altogether in favor of just throwing an error, but it can't
make it work differently unless somebody extends the API. It's
possible that this might suck a lot for some AMs. For instance, if
your AM is an in-memory btree, you might want CLUSTER to work in
place, rather than by copying the whole relation file, but I doubt
it's possible to make that work using the APIs as they exist. Maybe
someday we'll have better ones, but right now we don't.

As far as the specific question asked here, I don't really understand
how it could ever work any differently. If you want to invalidate some
TIDs in such a way that they can be reused by unrelated tuples - in
the case of the heap this would be by making the line pointers
LP_UNUSED - that has to be coordinated with the indexing machinery. I
can imagine a table AM that never reuses TIDs - especially if we
eventually have TIDs > 6 bytes - but I can't imagine a table AM that
reuses TIDs that might still be present in the index without telling
the index. And that seems to be what is being proposed here: if CIC or
RiC could be putting TIDs into indexes while at the same time some of
those very same TIDs were being made available for reuse, that would
be chaos, and right now we have no mechanism other than the lock level
to prevent that. We could invent one, maybe, but it doesn't exist
today, and zheap never tried to invent anything like that.

I agree that, if we do this, the comment should make clear that the
CIC/RiC lock has to conflict with VACUUM to avoid indexing tuples that
VACUUM is busy marking LP_UNUSED, and that this is precisely because
other backends will ignore the CIC/RiC snapshot while determining
whether a tuple is live. I don't think this is the case.
Hypothetically speaking, suppose we rejected this patch. Then, suppose
someone proposed to replace ShareUpdateExclusiveLock with two new lock
levels VacuumTuplesLock and IndexTheRelationLock each of which
conflicts with itself and all other lock levels with which
ShareUpdateExclusiveLock conflicts, but the two don't conflict with
each other. AFAIK, that patch would be acceptable today and
unacceptable after this change. While I'm not arguing that as a reason
to reject this patch, it's not impossible that somebody could: they'd
just need to argue that the separated lock levels would have greater
value than this patch. I don't think I believe that, but it's not a
totally crazy suggestion. My main point though is that this
meaningfully changing some assumptions about how the system works, and
we should be sure to be clear about that.

I looked at the patch but don't really understand it; the code in this
area seems to have changed a lot since I last looked at it. So, I'm
just commenting mostly on the specific question that was asked, and a
little bit on the theory of the patch, but without expressing an
opinion on the implementation.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-01-26 17:58:27 Re: mkid reference
Previous Message Laurenz Albe 2021-01-26 17:46:10 Re: Error on failed COMMIT