From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Daniel Verite" <daniel(at)manitou-mail(dot)org> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pgbench client-side performance issue on large scripts |
Date: | 2025-02-25 21:52:08 |
Message-ID: | 1654326.1740520328@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
"Daniel Verite" <daniel(at)manitou-mail(dot)org> writes:
> For the moment I'll stay with my quick fix, then l'll try
> to come up with something to replace expr_scanner_get_lineno() .
I got nerd-sniped by this question and spent some time looking into
it. ParseScript has got worse problems than just being slow: it's
actively buggy. Notice that start_offset is set only once before
entering the loop, and doesn't change thereafter. How is it that
we're getting sane line numbers at all? The reason is that (1) if
we've not called yylex() at all yet, expr_scanner_offset() gives the
distance to the end of the string, since the yytext-ending NUL it's
looking for isn't there yet; and (2) expr_scanner_get_lineno() treats
the given start_offset as an upper bound, and won't complain if it
finds the NUL earlier than that. So it gave the desired
line-number-of-the-current-token on all iterations after the first,
but on the first time through we get the line number of the script
end. You can only see that in the case of \gset as the first command,
and I guess nobody noticed it yet.
Furthermore, it's not only ParseScript that's got O(N^2) problems;
so does process_backslash_command. Your test case didn't show that
up, but a test with 50K backslash commands would. We were actually
doing a strlen() of the whole string for each word of a backslash
command. strlen() is likely faster than expr_scanner_get_lineno(),
but it's not so fast that O(N^2) effects don't matter.
The attached patch gets rid of both expr_scanner_offset() and
expr_scanner_get_lineno() altogether, in favor of using a new
function I added to psqlscan.l. That uses the idea from plpgsql
of tracking the last-detected line end so that we don't have to
rescan prior lines over and over. On my machine, parsing 50K-line
scripts goes from more than 10 seconds to perhaps 50 ms.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Get-rid-of-O-N-2-script-parsing-overhead-in-pgben.patch | text/x-diff | 12.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2025-02-25 22:05:28 | Re: Make COPY format extendable: Extract COPY TO format implementations |
Previous Message | Melanie Plageman | 2025-02-25 21:36:44 | Re: Log connection establishment timings |