Re: Sampling-based timing for EXPLAIN ANALYZE

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, David Geier <geidav(dot)pg(at)gmail(dot)com>
Cc: Jelte Fennema <me(at)jeltef(dot)nl>, Lukas Fittl <lukas(at)fittl(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Sampling-based timing for EXPLAIN ANALYZE
Date: 2023-01-17 14:52:07
Message-ID: 673e9b4b-3bf7-ac85-0bd9-c99952800975@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/15/23 21:22, Andres Freund wrote:
> Hi,
>
> On 2023-01-13 09:11:06 +0100, David Geier wrote:
>> Mostly I'm wondering if the sampling based approach gains us enough to be
>> worth it, once the patch to use RDTSC hopefully landed (see [1]).
>
> Well, I'm not sure we have a path forward on it. There's portability and
> accuracy concerns. But more importantly:
>
>> I believe that with the RDTSC patch the overhead of TIMING ON is lower than
>> the overhead of using ANALYZE with TIMING OFF in the first place. Hence, to
>> be really useful, it would be great if we could on top of TIMING SAMPLING
>> also lower the overhead of ANALYZE itself further (e.g. by using a fast path
>> for the default EXPLAIN (ANALYZE, TIMING ON / SAMPLING)). Currently,
>> InstrStartNode() and InstrStopNode() have a ton of branches and without all
>> the typically deactivated code the implementation would be very small and
>> could be placed in an inlinable function.
>
> Yes, I think SAMPLING could get rid of most of the instrumentation overhead -
> at the cost of a bit less detail in the explain, of course. Which could make
> it a lot more feasible to enable something like auto_explain.log_timing in
> busy workloads.
>
> For the sampling mode we don't really need something like
> InstrStart/StopNode. We just need a pointer to node currently executing - not
> free to set, but still a heck of a lot cheaper than InstrStopNode(), even
> without ->need_timer etc. Then the timer just needs to do
> instr->sampled_total += (now - last_sample)
> last_sample = now
>

I don't understand why we would even use timestamps, in this case? AFAIK
"sampling profilers" simply increment a counter for the executing node,
and then approximate the time as proportional to the count.

That also does not have issues with timestamp "rounding" - considering
e.g. sample rate 1000Hz, that's 1ms between samples. And it's quite
possible the node completes within 1ms, in which case

(now - last_sample)

ends up being 0 (assuming I correctly understand the code).

And I don't think there's any particularly good way to correct this.

It seems ExplainNode() attempts to do some correction, but I doubt
that's very reliable, as these fast nodes will have sampled_total=0, so
no matter what you multiply this with, it'll still be 0.

>
> I've been thinking that we should consider making more of the instrumentation
> code work like that. The amount of work we're doing in InstrStart/StopNode()
> has steadily risen. When buffer usage and WAL usage are enabled, we're
> executing over 1000 instructions! And each single Instrumentation node is ~450
> bytes, to a good degree due to having 2 BufUsage and 2 WalUsage structs
> embedded.
>
> If we instead have InstrStartNode() set up a global pointer to the
> Instrumentation node, we can make the instrumentation code modify both the
> "global" counters (pgBufferUsage, pgWalUsage) and, if set,
> current_instr->{pgBufferUsage, pgWalUsage}. That'll require some larger
> changes - right now nodes "automatically" include the IO/WAL incurred in child
> nodes, but that's just a small bit of additional summin-up to be done during
> EXPLAIN.
>

That's certainly one way to implement that. I wonder if we could make
that work without the global pointer, but I can't think of any.

>
> Separately, I think we should consider re-ordering Instrumentation so that
> bufusage_start, walusage_start are after the much more commonly used
> elements. We're forcing ntuples, nloops, .. onto separate cachelines, even
> though they're accounted for unconditionally.
>

+1 to that

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ted X Toth 2023-01-17 14:59:02 Re: [PATCH] Add <<none>> support to sepgsql_restorecon
Previous Message Masahiko Sawada 2023-01-17 14:37:02 Re: Perform streaming logical transactions by background workers and parallel apply