From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
Cc: | Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: asynchronous and vectorized execution |
Date: | 2016-05-10 06:08:44 |
Message-ID: | CAFj8pRCMLh1rpZ78wn2ovAR3nBkBK3zjG7tA4DNzVzF+W9H90w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2016-05-10 8:05 GMT+02:00 David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>:
> On 10 May 2016 at 16:34, Greg Stark <stark(at)mit(dot)edu> wrote:
> >
> > On 9 May 2016 8:34 pm, "David Rowley" <david(dot)rowley(at)2ndquadrant(dot)com>
> wrote:
> >>
> >> This project does appear to require that we bloat the code with 100's
> >> of vector versions of each function. I'm not quite sure if there's a
> >> better way to handle this. The problem is that the fmgr is pretty much
> >> a barrier to SIMD operations, and this was the only idea that I've had
> >> so far about breaking through that barrier. So further ideas here are
> >> very welcome.
> >
> > Well yes and no. In practice I think you only need to worry about
> vectorised
> > versions of integer and possibly float. For other data types there either
> > aren't vectorised operators or there's little using them.
> >
> > And I'll make a bold claim here that the only operators I think really
> > matter are =
> >
> > The rain is because using SIMD instructions is a minor win if you have
> any
> > further work to do per tuple. The only time it's a big win is if you're
> > eliminating entire tuples from consideration efficiently. = is going to
> do
> > that often, other btree operator classes might be somewhat useful, but
> > things like + really only would come up in odd examples.
> >
> > But even that understates things. If you have column oriented storage
> then =
> > becomes even more important since every scan has a series of implied
> > equijoins to reconstruct the tuple. And the coup de grace is that in a
> > column oriented storage you try to store variable length data as integer
> > indexes into a dictionary of common values so *everything* is an integer
> =
> > operation.
> >
> > How to do this without punching right through the executor as an
> abstraction
> > and still supporting extensible data types and operators was puzzling me
> > already. I do think it involves having these vector operators in the
> > catalogue and also some kind of compression mapping to integer indexes.
> But
> > I'm not sure that's all that would be needed.
>
> Perhaps the first move to make on this front will be for aggregate
> functions. Experimentation might be quite simple to realise which
> functions will bring enough benefit. I imagined that even Datums where
> the type is not processor native might yield a small speedup, not from
> SIMD, but just from less calls through fmgr. Perhaps we'll realise
> that those are not worth the trouble, I've no idea at this stage.
>
It can be reduced to sum and count in first iteration. On other hand lot of
OLAP reports is based on pretty complex expressions - and there probably
the compilation is better way.
Regards
Pavel
>
> --
> David Rowley http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2016-05-10 06:41:07 | Re: between not propated into a simple equality join |
Previous Message | David Rowley | 2016-05-10 06:05:00 | Re: asynchronous and vectorized execution |