From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path |
Date: | 2020-06-29 09:36:56 |
Message-ID: | CAApHDvqJ_+eKpgtQwV2GmJ7pKe1=JbhRgPzFnQK6Z5=ap3+bJw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 26 Jun 2020 at 13:21, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Fri, 26 Jun 2020 at 04:35, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2020-06-25 13:50:37 +1200, David Rowley wrote:
> > > In the attached, I've come up with a way that works. Basically I've
> > > just added a function named errstart_cold() that is attributed with
> > > __attribute__((cold)), which will hint to the compiler to keep
> > > branches which call this function in a cold path.
> >
> > I recall you trying this before? Has that gotten easier because we
> > evolved ereport()/elog(), or has gcc become smarter, or ...?
>
> Yeah, I appear to have tried it and found it not to work in [1]. I can
> only assume GCC got smarter in regards to how it marks a path as cold.
> Previously it seemed not do due to the do/while(0). I'm pretty sure
> back when I tested last that ditching the do while made it work, just
> we can't really get rid of it.
>
> > > To make the compiler properly mark just >= ERROR calls as cold, and
> > > only when the elevel is a constant, I modified the ereport_domain
> > > macro to do:
> > >
> > > if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \
> > > errstart_cold(elevel, domain) : \
> > > errstart(elevel, domain)) \
> >
> > I think it'd be good to not just do this for ERROR, but also for <=
> > DEBUG1. I recall seing quite a few debug elogs that made the code worse
> > just by "being there".
>
> I think that case is different. We don't want to move the entire elog
> path into the cold path for that. We'd only want to hint that errstart
> is unlikely to return true if elevel <= DEBUG1
I played around with this trying to find if there was a way to make this work.
v2 patch includes the change you mentioned about using __has_attribute
(cold) and removes the additional ereport_domain macro
v3 is v2 plus an additional change to mark the branch within
ereport_domain as unlikely when elevel <= DEBUG1
v4 is v2 plus it marks the errstart call as unlikely regardless of elevel.
I tried v4 as I was having trouble as v3 was showing worse performance
than v2. v4 appears better on the AMD system, but that system is
producing noisy results (very obvious if looking at attached
amd3990x_elog_cold.png)
I ran pgbench -S T 600 -P 10 with each patch and for the AMD machine I got:
master = 27817.32167 tps
v2 = 28991.65667 tps (104.22% of master)
v3 = 28622.775 tps (102.90% of master)
v4 = 29648.91 tps (106.58% of master)
(I attribute the speedup here not being the same as my last report due
to noise. A recent bios update partially fixed the problem, but not
completely)
For the intel laptop I got:
master = 25452.38167 tps
v2 = 25473.695 tps (100.08% of master)
v3 = 25434.89333 tps (99.93% of master)
v4 = 25389.02833 tps (99.75% of master)
Looking at the assembly for the v3 patch, it does appear that the
elevel <= DEBUG1 calls were correctly moved to the cold area and that
the ones > DEBUG1 and < ERROR were left alone. However, I did only
look at xlog.s. The intel results don't look very promising, but
perhaps this is not the ideal test to show improvements with
instruction cache efficiency.
> > > +bool
> > > +pg_attribute_cold errstart_cold(int elevel, const char *domain)
> > > +{
> > > + return errstart(elevel, domain);
> > > +}
> > > +#endif
> >
> > Hm. Would it make sense to have this be a static inline?
I didn't find a way to make this work (using gcc-10). Inlining, of
course, makes the assembly just call errstart(). errstart_cold() is
nowhere to be seen. The __attribute(cold) does not seem to be applied
to the errstart() call where the errstart_cold() call was inlined.
I've attached a graph showing the results for the AMD and Intel runs
and also csv files of the pgbench tps output. I've also attached each
version of the patch I tested.
It would be good to see some testing done on other hardware.
David
Attachment | Content-Type | Size |
---|---|---|
amd3990x_elog_cold.png | image/png | 203.5 KB |
intel-i78565u_elog_cold.png | image/png | 78.7 KB |
amd3990x_elog_cold.csv | application/vnd.ms-excel | 2.1 KB |
intel_i7-8565U_elog_cold.csv | application/vnd.ms-excel | 2.1 KB |
elog_ereport_attribute_cold_v2.patch | application/octet-stream | 2.3 KB |
elog_ereport_attribute_cold_v3.patch | application/octet-stream | 2.4 KB |
elog_ereport_attribute_cold_v4.patch | application/octet-stream | 2.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2020-06-29 09:38:17 | Re: track_planning causing performance regression |
Previous Message | Julien Rouhaud | 2020-06-29 09:17:14 | Re: track_planning causing performance regression |