From: | Tomasz Ostrowski <tometzky+pg(at)ato(dot)waw(dot)pl> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Invalid indexes should not consume update overhead |
Date: | 2016-07-13 11:10:45 |
Message-ID: | 8b57a306-7951-8442-df16-8c581d61b4e6@ato.waw.pl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
When index is build concurrently and this build fails the index is left
in invalid state. It's basically unusable for anything, but, according
to documentation:
> If a problem arises while scanning the table, such as a deadlock or
> a uniqueness violation in a unique index, the CREATE INDEX command
> will fail but leave behind an "invalid" index. This index will be
> ignored for querying purposes because it might be incomplete; however
> it *will* *still* *consume* *update* *overhead*. The psql \d command
> will report such an index as INVALID
I think this update overhead is actually wasted - there's no way to make
use of it, as the only way to make the index usable again is to reindex
it or drop and recreate.
In the other hand if invalid indexes would have no update overhead then
they may actually be useful. Please consider the following situation
(taken from the real world):
- I have a very large table with a large number of indexes on a server
with large number of CPUs.
- I need to add a new not null column and I need to do this in a fairly
small maintenance window.
- I tried to simply "alter table tablename add column columnname int not
null default 0", but it did not end in 24 hours - it updates these tens
of indexes in single process, using 100% of a single CPU - unacceptable.
- I mark all the indexes as invalid (currently I'm just saving and
dropping them).
- This time adding of the column takes an hour.
- I vacuum full or cluster the table, as it has now bloated at least to
200% - this is also fast, as indexes are not updated.
- I'm reindexing (currently recreating) all indexes, but instead of
calculating them on a single CPU I use all available CPU's (I can have
like 40 of them on Amazon AWS RDS cheaply if I need them only for a few
hours). This also ends in an hour.
- The world is saved and everybody celebrate.
Dropping and recreating indexes is not a very safe operation - their
definitions need to be saved somewhere out of the database and if
anything goes wrong it would not be easy to restore them. If it would be
possible to just set them invalid (feature request in passing) and if
invalid indexed would not add cost on updates, then this would be much
easier and safer.
--
Tomasz "Tometzky" Ostrowski
From | Date | Subject | |
---|---|---|---|
Next Message | pgerber | 2016-07-13 13:21:22 | BUG #14247: COMMENT is restored on wrong database |
Previous Message | David Waller | 2016-07-13 10:42:09 | Re: BUG #14237: Terrible performance after accidentally running 'drop index' for index still being created |