debugging what might be a perf regression in 17beta2

From: MARK CALLAGHAN <mdcallag(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: debugging what might be a perf regression in 17beta2
Date: 2024-07-06 03:11:08
Message-ID: CAFbpF8NGCpv67n9kVaH-Ho8+qHQAZZO0K-gH-WBcHD12AsyoSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I am seeking advice. For now I hope for a suggestion about changes from
17beta1 to 17beta2 that might cause the problem -- assuming there is a
problem, and not a mistake in my testing.

One of the sysbench microbenchmarks that I run does a table scan with a
WHERE clause that filters out all rows. That WHERE clause is there to
reduce network IO.

While running it on a server with 16 real cores, 12 concurrent queries and
a cached database the query takes ~5% more time on 17beta2 than on 17beta1
or 16.3. Alas, this is a Google Cloud server and perf doesn't work there.

On small servers I have at home I can reproduce the problem without
concurrent queries and 17beta2 is 5% to 10% slower there.

The SQL statement for the scan microbenchmark is:
SELECT * from %s WHERE LENGTH(c) < 0

I will call my small home servers SER4 and PN53. They are described here:
https://smalldatum.blogspot.com/2022/10/small-servers-for-performance-testing-v4.html

The SER4 is a SER 4700u from Beelink and the PN53 is an ASUS ExpertCenter
PN53. Both use an AMD CPU with 8 cores, AMD SMT disabled and Ubuntu 22.04.
The SER4 has an older, slower CPU than the PN53. In all cases I compile
from source using a configure command line like:

./configure --prefix=$pfx --enable-debug CFLAGS="-O2
-fno-omit-frame-pointer"

I used perf to get flamegraphs during the scan microbenchmark and they are
archived here:
https://github.com/mdcallag/mytools/tree/master/bench/bugs/pg17beta2/24Jul5.sysbench.scan

For both SER4 and PN53 the time to finish the scan microbenchmark is ~10%
longer in 17beta2 than it was in 17beta1 and 16.3. On the PN53 the query
takes ~20 seconds with 16.3 and 17beta1 vs ~22.5 seconds for 17beta2 when
the table has 60M rows.

From the SVG files for SER4 and 17beta2 I see ~2X more time in
slot_getsomeattrs_int vs 17beta1 or 16.3 with all of that time spent in its
child -- tts_buffer_heap_getsomeattrs
<https://draft.blogger.com/blog/post/edit/9149523927864751087/2076930226137683424#>.
That function is defined in src/backend/executor/execTuples.c and that file
has not changed from 17beta1 to 17beta2. But I don't keep up with
individual commits to Postgres so I won't guess as to the root cause.

But the SVG files for PN53 don't show the same problem:

- for 16.3 I see 85.24% in ExecInterpExpr vs 11.64% in SeqNext
- for 17beta1 I see 82.82% in ExecInterpExpr vs 14.51% in SeqNext
- for 17beta2 I see 85.03% in ExecInterpExpr vs 12.31% in SeqNext
- for 17beta1 and 17beta2 the flamegraphs shows time spent handling page
faults during SeqNext, and that isn't visible on the 16.3 flamegraph

And then for PN53 looking at slot_getsomeattrs_int, a child of
ExecInterpExpr

- for 16.3 I see 6.99% in slot_getsomeattrs_int
- for 17beta1 I see 4.29% in slot_getsomeattrs_int
- for 17beta2 I see 3.99% in slot_getsomeattrs_int

So at this point I am confused and repeating the test with a slightly
larger table, but I am trying to keep the table small enough to fit in the
Postgres buffer pool. I also have results from tables that are much larger
than memory, and even in that case the problem can be reproduced.

--
Mark Callaghan
mdcallag(at)gmail(dot)com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-07-06 03:48:01 Re: debugging what might be a perf regression in 17beta2
Previous Message jian he 2024-07-06 02:55:00 Re: Doc Rework: Section 9.16.13 SQL/JSON Query Functions