From: | Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Inlining comparators as a performance optimisation |
Date: | 2011-12-06 10:40:23 |
Message-ID: | CAP-rdTb85gGutES_GYUy36XrCTxmBhjwPSh+K+pvkEbBPU_nbg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2011/12/5 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> What is bothering me is that this approach is going to cause substantial
> bloat of the executable code, and such bloat has got distributed costs,
> which we don't have any really good way to measure but for sure
> micro-benchmarks addressing only sort speed aren't going to reveal them.
> Cache lines filled with sort code take cycles to flush and replace with
> something else.
>
> I think it's possibly reasonable to have inlined copies of qsort for a
> small number of datatypes, but it seems much less reasonable to have
> multiple copies per datatype in order to obtain progressively tinier
> micro-benchmark speedups. We need to figure out what the tradeoff
> against executable size really is, but so far it seems like you've been
> ignoring the fact that there is any such tradeoff at all.
[ Randomly spouting ideas here: ]
Might it not be a good idea to decide whether to use the inlined
copies vs. the normal version, based on how much data to sort? Surely
for a very large sort, the cache blow-out doesn't matter that much
relative to the amount of time that can be saved sorting?
Assuming that all types would have an inlined sort function, although
that would indeed result in a larger binary, most of that binary would
never touch the cache, because corresponding large sorts are never
performed. If they would sporadically occur (and assuming the points
at which inline sorting starts to get used are chosen wisely), the
possibly resulting cache blow-out would be a net win.
I am also assuming here that instruction cache lines are small enough
for case line aliasing not to become a problem; putting all sort
functions next to each other in the binary (so that they don't alias
with the rest of the backend code that might be used more often) might
alleviate that.
Nicolas
--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2011-12-06 11:38:30 | Re: IDLE in transaction introspection |
Previous Message | Magnus Hagander | 2011-12-06 10:14:02 | Re: xlog location arithmetic |