From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Tomasz Ostrowski <tometzky+pg(at)ato(dot)waw(dot)pl> |
Cc: | PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Invalid indexes should not consume update overhead |
Date: | 2016-07-16 12:55:08 |
Message-ID: | CAA4eK1JNmxL=kKB210GfA14q4T4yAC_U6XPo6yhi9VuBhxEPog@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Wed, Jul 13, 2016 at 4:40 PM, Tomasz Ostrowski
<tometzky+pg(at)ato(dot)waw(dot)pl> wrote:
> When index is build concurrently and this build fails the index is left in
> invalid state. It's basically unusable for anything, but, according to
> documentation:
>
>> If a problem arises while scanning the table, such as a deadlock or
>> a uniqueness violation in a unique index, the CREATE INDEX command
>> will fail but leave behind an "invalid" index. This index will be
>> ignored for querying purposes because it might be incomplete; however
>> it *will* *still* *consume* *update* *overhead*. The psql \d command
>> will report such an index as INVALID
>
>
> I think this update overhead is actually wasted - there's no way to make use
> of it, as the only way to make the index usable again is to reindex it or
> drop and recreate.
>
> In the other hand if invalid indexes would have no update overhead then they
> may actually be useful. Please consider the following situation (taken from
> the real world):
>
> - I have a very large table with a large number of indexes on a server with
> large number of CPUs.
> - I need to add a new not null column and I need to do this in a fairly
> small maintenance window.
> - I tried to simply "alter table tablename add column columnname int not
> null default 0", but it did not end in 24 hours - it updates these tens of
> indexes in single process, using 100% of a single CPU - unacceptable.
> - I mark all the indexes as invalid (currently I'm just saving and dropping
> them).
> - This time adding of the column takes an hour.
> - I vacuum full or cluster the table, as it has now bloated at least to 200%
> - this is also fast, as indexes are not updated.
> - I'm reindexing (currently recreating) all indexes, but instead of
> calculating them on a single CPU I use all available CPU's (I can have like
> 40 of them on Amazon AWS RDS cheaply if I need them only for a few hours).
> This also ends in an hour.
> - The world is saved and everybody celebrate.
>
> Dropping and recreating indexes is not a very safe operation - their
> definitions need to be saved somewhere out of the database and if anything
> goes wrong it would not be easy to restore them. If it would be possible to
> just set them invalid (feature request in passing) and if invalid indexed
> would not add cost on updates, then this would be much easier and safer.
>
How can you consider marking invalid index as valid index without
reindexing it? It is quite possible that in the meantime the table
has been updated.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-07-16 19:35:36 | Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column |
Previous Message | Peter Geoghegan | 2016-07-16 02:21:06 | Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column |