Quick Links

Re: asynchronous and vectorized execution

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: asynchronous and vectorized execution
Date:	2016-05-10 06:08:44
Message-ID:	CAFj8pRCMLh1rpZ78wn2ovAR3nBkBK3zjG7tA4DNzVzF+W9H90w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

2016-05-10 8:05 GMT+02:00 David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>:

> On 10 May 2016 at 16:34, Greg Stark <stark(at)mit(dot)edu> wrote:
> >
> > On 9 May 2016 8:34 pm, "David Rowley" <david(dot)rowley(at)2ndquadrant(dot)com>
> wrote:
> >>
> >> This project does appear to require that we bloat the code with 100's
> >> of vector versions of each function. I'm not quite sure if there's a
> >> better way to handle this. The problem is that the fmgr is pretty much
> >> a barrier to SIMD operations, and this was the only idea that I've had
> >> so far about breaking through that barrier. So further ideas here are
> >> very welcome.
> >
> > Well yes and no. In practice I think you only need to worry about
> vectorised
> > versions of integer and possibly float. For other data types there either
> > aren't vectorised operators or there's little using them.
> >
> > And I'll make a bold claim here that the only operators I think really
> > matter are =
> >
> > The rain is because using SIMD instructions is a minor win if you have
> any
> > further work to do per tuple. The only time it's a big win is if you're
> > eliminating entire tuples from consideration efficiently. = is going to
> do
> > that often, other btree operator classes might be somewhat useful, but
> > things like + really only would come up in odd examples.
> >
> > But even that understates things. If you have column oriented storage
> then =
> > becomes even more important since every scan has a series of implied
> > equijoins to reconstruct the tuple. And the coup de grace is that in a
> > column oriented storage you try to store variable length data as integer
> > indexes into a dictionary of common values so *everything* is an integer
> =
> > operation.
> >
> > How to do this without punching right through the executor as an
> abstraction
> > and still supporting extensible data types and operators was puzzling me
> > already. I do think it involves having these vector operators in the
> > catalogue and also some kind of compression mapping to integer indexes.
> But
> > I'm not sure that's all that would be needed.
>
> Perhaps the first move to make on this front will be for aggregate
> functions. Experimentation might be quite simple to realise which
> functions will bring enough benefit. I imagined that even Datums where
> the type is not processor native might yield a small speedup, not from
> SIMD, but just from less calls through fmgr. Perhaps we'll realise
> that those are not worth the trouble, I've no idea at this stage.
>

It can be reduced to sum and count in first iteration. On other hand lot of
OLAP reports is based on pretty complex expressions - and there probably
the compilation is better way.

Regards

Pavel

>
> --
> David Rowley http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

In response to

Re: asynchronous and vectorized execution at 2016-05-10 06:05:00 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Rowley	2016-05-10 06:41:07	Re: between not propated into a simple equality join
Previous Message	David Rowley	2016-05-10 06:05:00	Re: asynchronous and vectorized execution