Re: Reducing planning time on tables with many indexes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Geier <geidav(dot)pg(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Reducing planning time on tables with many indexes
Date: 2022-07-27 16:39:02
Message-ID: 3002655.1658939942@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Geier <geidav(dot)pg(at)gmail(dot)com> writes:
> We tracked down the root cause of this slowdown to lock contention in
> 'get_relation_info()'. The index lock of every single index of every single
> table used in that query is acquired. We attempted a fix by pre-filtering
> out all indexes that anyways cannot be used with a certain query, without
> taking the index locks (credits to Luc Vlaming for idea and
> implementation). The patch does so by caching the columns present in every
> index, inside 'struct Relation', similarly to 'rd_indexlist'.

I wonder how much thought you gave to the costs imposed by that extra
cache space. We have a lot of users who moan about relcache bloat
already. But more to the point, I do not buy the assumption that
an index's set of columns is a good filter for which indexes are of
interest. A trivial counterexample from the regression database is

regression=# explain select count(*) from tenk1;
QUERY PLAN

--------------------------------------------------------------------------------
------------
Aggregate (cost=219.28..219.29 rows=1 width=8)
-> Index Only Scan using tenk1_hundred on tenk1 (cost=0.29..194.28 rows=100
00 width=0)
(2 rows)

It looks to me like the patch also makes unwarranted assumptions about
being able to discard all but the smallest index having a given set
of columns. This would, for example, possibly lead to dropping the
index that has the most useful sort order, or that has the operator
class needed to support a specific WHERE clause.

In short, I'm not sure I buy this concept at all. I think it might
be more useful to attack the locking overhead more directly. I kind
of wonder why we need per-index locks at all during planning ---
I think that we already interpret AccessShareLock on the parent table
as being sufficient to block schema changes on existing indexes.

Unfortunately, as things stand today, the planner needs more than the
right to look at the indexes' schemas, because it makes physical accesses
to btree indexes to find out their tree height (and I think there are some
comparable behaviors in other AMs). I've never particularly cared for
that implementation, and would be glad to rip out that behavior if we can
find another way. Maybe we could persuade VACUUM or ANALYZE to store that
info in the index's pg_index row, or some such, and then the planner
could use it with no lock?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-07-27 17:15:02 Re: Reducing planning time on tables with many indexes
Previous Message Dilip Kumar 2022-07-27 16:37:16 Re: making relfilenodes 56 bits