Quick Links

Re: pgbench client-side performance issue on large scripts

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Daniel Verite" <daniel(at)manitou-mail(dot)org>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: pgbench client-side performance issue on large scripts
Date:	2025-02-25 21:52:08
Message-ID:	1654326.1740520328@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

"Daniel Verite" <daniel(at)manitou-mail(dot)org> writes:
> For the moment I'll stay with my quick fix, then l'll try
> to come up with something to replace expr_scanner_get_lineno() .

I got nerd-sniped by this question and spent some time looking into
it. ParseScript has got worse problems than just being slow: it's
actively buggy. Notice that start_offset is set only once before
entering the loop, and doesn't change thereafter. How is it that
we're getting sane line numbers at all? The reason is that (1) if
we've not called yylex() at all yet, expr_scanner_offset() gives the
distance to the end of the string, since the yytext-ending NUL it's
looking for isn't there yet; and (2) expr_scanner_get_lineno() treats
the given start_offset as an upper bound, and won't complain if it
finds the NUL earlier than that. So it gave the desired
line-number-of-the-current-token on all iterations after the first,
but on the first time through we get the line number of the script
end. You can only see that in the case of \gset as the first command,
and I guess nobody noticed it yet.

Furthermore, it's not only ParseScript that's got O(N^2) problems;
so does process_backslash_command. Your test case didn't show that
up, but a test with 50K backslash commands would. We were actually
doing a strlen() of the whole string for each word of a backslash
command. strlen() is likely faster than expr_scanner_get_lineno(),
but it's not so fast that O(N^2) effects don't matter.

The attached patch gets rid of both expr_scanner_offset() and
expr_scanner_get_lineno() altogether, in favor of using a new
function I added to psqlscan.l. That uses the idea from plpgsql
of tracking the last-detected line end so that we don't have to
rescan prior lines over and over. On my machine, parsing 50K-line
scripts goes from more than 10 seconds to perhaps 50 ms.

regards, tom lane

Attachment	Content-Type	Size
v1-0001-Get-rid-of-O-N-2-script-parsing-overhead-in-pgben.patch	text/x-diff	12.3 KB

In response to

Re: pgbench client-side performance issue on large scripts at 2025-02-25 13:10:31 from Daniel Verite

Responses

Re: pgbench client-side performance issue on large scripts at 2025-02-26 00:17:53 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Masahiko Sawada	2025-02-25 22:05:28	Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message	Melanie Plageman	2025-02-25 21:36:44	Re: Log connection establishment timings