Quick Links

Re: Can the query planner create indexes?

From:	Tomas Vondra <tv(at)fuzzy(dot)cz>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Can the query planner create indexes?
Date:	2010-12-21 18:50:57
Message-ID:	4D10F711.1090606@fuzzy.cz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Dne 21.12.2010 16:34, Jeremy Harris napsal(a):
>> There really is no automatic way to solve this puzzle using a single
>> query. Indexing strategy is a very tough design discipline, and it
>> requires a complex knowledge of the workload. One slow query does not
>> mean
>> the index should be created - what if that was just an ad-hoc query and
>> will not be executed ever again?
>
> Then the index you just built gets automatically dropped, as I said above.

I'm a bit confused. Should the indexes be dropped automatically (as you
state here) or kept for the future. Because if they should be dropped,
then it does not make sense to do this magic just for a limited time
after the DB goes live.

>> I really don't want to work with products that try to be smarter than me
>> (and I admit I'm pretty dumb from time to time) and act rather randomly
>> due to this 'artificial intelligence'. I've already been there and I
>> don't
>> want to repeat this experience.
>
> Then, since you're not a neophyte, leave the feature turned off. But
> don't deny the possibility of using it to actual neophytes.

This is not a question of whether I am a neophyte or not, this is a
question of good practices. My experience is that building indexes
should be done when designing the application, and tested at in a test
environment. At production, log slow queries, analyze the log and add
indexes when needed.

This 'neophyte approach' is a really bad idea from my point of view. It
inevitably leads to a database with a zillion of unnecessary indexes,
missing the right ones. And any limits don't prevent this. In the end
you'll get a slow database, and the neophytes would blame the database
although that state is inevitable.

I've been working with several database products over the years, and
AFAIK none of them does this. In most cases "the whole industry is
wrong" is usually a really bad sign (does not hold if you're Steve Jobs).

Most of them do have 'advisors' though - that's a good idea, and I think
sooner or later that will be in PostgreSQL too.

>>>> as in b): Creating an index is quite expensiv
>>>
>>> How much more so than doing that full-table-scan plus sort, which your
>>> query is doing anyway?
>>
>> A lot. It consumes a lot of CPU, it may consume a lot of memory (up to
>> maintenance_work_mem) etc. So imagine a few users, building indices on a
>> big table simultaneously
>
> Pffthht. One simple trylock, used only by the auto-indexer.

Pffthht? Anyway trylock is not an answer. Think about a different table
for each user.

>> Building an index is just one side of the problem - maintenance of the
>> indexes is another thing. Each index has an impact on write operations
>> (INSERT/UPDATE) and may cause that HOT actually does not work (thus
>> causing unnecessary bloat of the index).
>
> This is worth discussing. How much help does the DBMS currently give
> the DBA in evaluating these tradeoffs? Could we do better, given an
> actual installation and workload?

The database gives you explain plans, plus pg_stat_* and pg_statio_*
views. That's a lot of data, although does not provide a complete view
in many cases. And then there's a log_min_duration, which is the primary
weapon in fighting slow queries after going live.

>> And the indexes may need lot of
>> space on a disk.
>
> By all means require limits as well as a "don't do that" switch.

As I said, this is not a good approach from my POV. And it's really
really hard to implement this in a way that those who don't want to use
it don't have to pay the price. Because what portion of users would
really use this feature? 0.0001%?

>> But the real show stopper is probably locking. Building an index takes a
>> write lock on a table, effectively blocking writes. Sure, you can use a
>> 'CREATE INDEX ... CONCURRENTLY' introduced in 8.2, but there are some
>> disadvantages of that (see
>> http://www.postgresql.org/docs/9.0/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY)
>>
>
> The only disadvantages I see there are a) the leftover "invalid" index -
> which feels like a
> bug; why is it not auto-dropped? and b) the second scan, which implies
> more total work
> and a wish to background that portion after completing the query triggering
> the auto-index.

Dropping an index automatically is a really bad idea. Actually dropping
indexes in general (no matter if the index was created automatically or
manually) is a very tricky thing.

How do you decide which indexes are really "leftover" from those that
are used only rarely? I've seen application failing spectacularily
because a DBA dropped an index that was not used ... except for a batch
process that runs once a year, to close a fiscal year.

> Don't forget I suggested doing this only for a limited time after DB
> creation. Possibly another
> reason to turn it off should be any manual index creation, as that
> implies that the DBA
> knows about indexing.
>
> I see in another thread you suggest merely placing hints in the log.
> That's a fine
> first step - but I'll then be wanting to auto-parse that log to
> auto-create....

Well, I don't think that is going to happen. I'm not going to implement
this (I've explained my opinions on this above), and I don't think this
would get into core anyway.

Logging? Probably. Advisors? Maybe, although as a contrib module.
Creating indexes automatically? I doubt that.

regards
Tomas

In response to

Re: Can the query planner create indexes? at 2010-12-21 15:34:38 from Jeremy Harris

Responses

Re: Can the query planner create indexes? at 2010-12-21 19:03:17 from Jeremy Harris

Browse pgsql-general by date

	From	Date	Subject
Next Message	Jeremy Harris	2010-12-21 19:03:17	Re: Can the query planner create indexes?
Previous Message	Kelly Burkhart	2010-12-21 18:42:40	libpq sendQuery -- getResult not returning until all queries complete