"Measuring timing overhead" in docs seems misleading

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: "Measuring timing overhead" in docs seems misleading
Date: 2023-01-16 21:39:13
Message-ID: 20230116213913.4oseovlzvc2674z7@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I was trying to extract a commitable piece out of [1]. To be able to judge
changes in timing overhead more accurately, I thought it'd be sensible to
update pg_test_timing to report nanoseconds instead of microseconds. Which
lead to trying to update pg_test_timing's docs [2].

The "Measuring Executor Timing Overhead" section seems misleading:
<para>
The i7-860 system measured runs the count query in 9.8 ms while
the <command>EXPLAIN ANALYZE</command> version takes 16.6 ms, each
processing just over 100,000 rows. That 6.8 ms difference means the timing
overhead per row is 68 ns, about twice what pg_test_timing estimated it
would be. Even that relatively small amount of overhead is making the fully
timed count statement take almost 70% longer. On more substantial queries,
the timing overhead would be less problematic.
</para>

The main reason for the ~2x discrepancy is that we do 2 timestamp calls for
each invocation of an executor node, one in InstrStartNode(), one in
InstrStopNode(). I think this should be clarified in the section?

I also think we should update the section to compare
EXPLAIN (ANALYZE, TIMING OFF) with
EXPLAIN (ANALYZE, TIMING ON)
rather than comparing the "bare" statement with EXPLAIN ANALYZE. There's
plenty other overhead in EXPLAIN, even without TIMING ON.

With the instr_time-as-nanosec patches applied (I'll post a new version in a
few minutes), I get the following:

pg_test_timing:
Per loop time including overhead: 13.97 ns
Histogram of timing durations:
< ns % of total count
16 97.48221 209400569
32 2.51201 5396022
64 0.00477 10246
128 0.00030 640
256 0.00005 117
512 0.00000 0
1024 0.00000 3
2048 0.00034 729
4096 0.00001 14
8192 0.00000 8
16384 0.00015 320
32768 0.00014 303
65536 0.00001 26
131072 0.00000 0
262144 0.00000 1

psql -Xc 'DROP TABLE IF EXISTS t; CREATE TABLE t AS SELECT * FROM generate_series(1, 100000) g(i);' && pgbench -n -r -t 100 -f <(echo -e "SELECT COUNT(*) FROM t;EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t;EXPLAIN (ANALYZE, TIMING ON) SELECT COUNT(*) FROM t;") |grep '^ '
DROP TABLE
SELECT 100000
3.431 0 SELECT COUNT(*) FROM t;
3.888 0 EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t;
6.671 0 EXPLAIN (ANALYZE, TIMING ON) SELECT COUNT(*) FROM t;

Pgbench reports about 11% lost just from TIMING OFF ANALYZE, and a further 45%
from TIMING ON. The per-row overhead, compared between TIMING ON/OFF:

((6.187ms - 3.423 ms) * 1000000)/(100000 * 2) = 13.82ns

which is within the run-to-run variance of the pg_test_timing result.

Greetings,

Andres Freund

[1] https://postgr.es/m/20230116023639.rn36vf6ajqmfciua%40awork3.anarazel.de
[2] https://www.postgresql.org/docs/current/pgtesttiming.html#id-1.9.5.11.7.3

Browse pgsql-hackers by date

  From Date Subject
Next Message Maciek Sakrejda 2023-01-16 21:41:49 Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Previous Message Tom Lane 2023-01-16 21:00:52 Re: Extracting cross-version-upgrade knowledge from buildfarm client