Quick Links

Re: [PATCH] Add extra statistics to explain for Nested Loop

From:	Julien Rouhaud <rjuju123(at)gmail(dot)com>
To:	Ekaterina Sokolova <e(dot)sokolova(at)postgrespro(dot)ru>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Lukas Fittl <lukas(at)fittl(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: [PATCH] Add extra statistics to explain for Nested Loop
Date:	2022-04-02 14:43:46
Message-ID:	20220402144346.5eb36risy4iu7tsi@jrouhaud
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On Fri, Apr 01, 2022 at 11:46:47PM +0300, Ekaterina Sokolova wrote:
>
> > Most of the comments I have are easy to fix. But I think that the real
> > problem
> > is the significant overhead shown by Ekaterina that for now would apply
> > even if
> > you don't consume the new stats, for instance if you have
> > pg_stat_statements.
> > And I'm still not sure of what is the best way to avoid that.
> I took your advice about InstrumentOption. Now INSTRUMENT_EXTRA exists.
> So currently it's no overheads during basic load. Operations using
> INSTRUMENT_ALL contain overheads (because of INSTRUMENT_EXTRA is a part of
> INSTRUMENT_ALL), but they are much less significant than before. I apply new
> overhead statistics collected by pgbench with auto _explain enabled.

Can you give a bit more details on your bench scenario? I see contradictory
results, where the patched version with more code is sometimes way faster,
sometimes way slower. If you're using pgbench
default queries (including write queries) I don't think that any of them will
hit the loop code, so it's really a best case scenario. Also write queries
will make tests less stable for no added value wrt. this code.

Ideally you would need a custom scenario with a single read-only query
involving a nested loop or something like that to check how much overhead you
really get when you cumulate those values. I will try to
>
> > Why do you need to initialize min_t and min_tuples but not max_t and
> > max_tuples while both will initially be 0 and possibly updated
> > afterwards?
> We need this initialization for min values so comment about it located above
> the block of code with initialization.

Sure, but if we're going to have a branch for nloops == 0, I think it would be
better to avoid redundant / useless instructions, something like:

if (nloops == 0)
{
min_t = totaltime;
min_tuple = tuplecount;
}
else
{
if (min_t...)
...
}

While on that part of the patch, there's an extra new line between max_t and
min_tuple processing.

In response to

Re: [PATCH] Add extra statistics to explain for Nested Loop at 2022-04-01 20:46:47 from Ekaterina Sokolova

Responses

Re: [PATCH] Add extra statistics to explain for Nested Loop at 2022-04-05 21:14:09 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2022-04-02 15:02:01	Re: support for MERGE
Previous Message	chap	2022-04-02 13:39:52	Re: PostgreSQL shutdown modes