Re: Index overhead cost reporting

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thom Brown <thom(at)linux(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index overhead cost reporting
Date: 2013-12-07 19:41:54
Message-ID: 12675.1386445314@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thom Brown <thom(at)linux(dot)com> writes:
> I was wondering whether anyone has any insight with regards to
> measuring and reporting the overhead of maintaining indexes on
> relations. If an UPDATE is issued to a table with, say, 6 indexes, it
> would be useful to determine how much time is spent updating each of
> those indexes. And perhaps such timings would not be confined to
> indexes, but also other dependants that add overhead, such as
> triggers, rules, and in future, eager incremental materialised view
> updates.

We already do track the time spent in triggers. Although Kevin might
have a different idea, I'd think that matview updates should also be
driven by triggers, so that the last item there would be covered.

> Is that something that could be provided in an EXPLAIN ANALYSE node?

Well, it'd not be particularly difficult to add measurements of the time
spent in ExecInsertIndexTuples, but I'd have some concerns about that:

* Instrumentation overhead. That's already painful on machines with
slow gettimeofday, and this has the potential to add a lot more,
especially with the more expansive readings of your proposal.

* Is it really measuring the right thing? To a much greater degree
than for some other things you might try to measure, just counting
time spent in ExecInsertIndexTuples is going to understate the true
cost of updating an index, because so much of the true cost is paid
asynchronously; viz, writing WAL as well as the actual index pages.
We already have that issue with measuring the runtime of a
ModifyTable node as a whole, but slicing and dicing that time
ten ways would make the results even more untrustworthy, IMO.

* There are also other costs to think of, such as the time spent by
VACUUM on maintaining the index, and the time spent by the planner
considering (perhaps vainly) whether it can use the index for each
query that reads the table. In principle you could instrument
VACUUM to track the time it spends updating each index, and log
that in the pgstats infrastructure. (I'd even think that might be
a good idea, except for the bloat effect on the pgstats files.)
I'm not at all sure there's any practical way to measure the distributed
planner overhead; it's not paid in discrete chunks large enough to be
timed easily. Perhaps it's small enough to ignore, but I'm not sure.

Bottom line, I think it's a harder problem than it might seem at
first glance.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-12-07 19:46:17 Re: ANALYZE sampling is too good
Previous Message Thom Brown 2013-12-07 18:45:03 Index overhead cost reporting