From: | Thom Brown <thom(at)linux(dot)com> |
---|---|
To: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: VOPS: vectorized executor for Postgres: how to speedup OLAP queries more than 10 times without changing anything in Postgres executor |
Date: | 2017-02-16 17:42:40 |
Message-ID: | CAA-aLv5k0y6NzzD6j+M-jW=WbW7xbFCfXGB1DK1zdZ2-E_+iAQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 16 February 2017 at 17:00, Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> More progress in vectorized Postgres extension (VOPS). It is not required
> any more to use some special functions in queries.
> You can use vector operators in query with standard SQL and still get ten
> times improvement on some queries.
> VOPS extension now uses post parse analyze hook to transform query.
> I really impressed by flexibility and extensibility of Postgres type system.
> User defined types&operatpors&casts do most of the work.
>
> It is still responsibility of programmer or database administrator to create
> proper projections
> of original table. This projections need to use tiles types for some
> attributes (vops_float4,...).
> Then you can query this table using standard SQL. And this query will be
> executed using vector operations!
>
> Example of such TPC-H queries:
>
> Q1:
> select
> l_returnflag,
> l_linestatus,
> sum(l_quantity) as sum_qty,
> sum(l_extendedprice) as sum_base_price,
> sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
> sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
> avg(l_quantity) as avg_qty,
> avg(l_extendedprice) as avg_price,
> avg(l_discount) as avg_disc,
> count(*) as count_order
> from
> vops_lineitem_projection
> where
> l_shipdate <= '1998-12-01'::date
> group by
> l_returnflag,
> l_linestatus
> order by
> l_returnflag,
> l_linestatus;
>
>
>
> Q6:
> select
> sum(l_extendedprice*l_discount) as revenue
> from
> lineitem_projection
> where
> l_shipdate between '1996-01-01'::date and '1997-01-01'::date
> and l_discount between 0.08 and 0.1
> and l_quantity < 24;
>
> On 13.02.2017 17:12, Konstantin Knizhnik wrote:
>>
>> Hello hackers,
>>
>> There were many discussions concerning possible ways of speeding-up
>> Postgres. Different approaches were suggested:
>>
>> - JIT (now we have three different prototype implementations based on
>> LLVM)
>> - Chunked (vectorized) executor
>> - Replacing pull with push
>> - Columnar store (cstore_fdw, IMCS)
>> - Optimizing and improving current executor (reducing tuple deform
>> overhead, function call overhead,...)
>>
>> Obviously the best result can be achieved in case of combining all this
>> approaches. But actually them are more or less interchangeable: vectorized
>> execution is not eliminating interpretation overhead, but it is divided by
>> vector size and becomes less critical.
>>
>> I decided to write small prototype to estimate possible speed improvement
>> of vectorized executor. I created special types representing "tile" and
>> implement standard SQL operators for them. So neither Postgres planer,
>> nether Postgres executor, nether Postgres heap manager are changed. But I
>> was able to reach more than 10 times speed improvement on TPC-H Q1/Q6
>> queries!
Impressive work!
Thom
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2017-02-16 17:45:57 | Re: duplicate "median" entry in doc |
Previous Message | Robert Haas | 2017-02-16 17:30:42 | Re: Parallel tuplesort (for parallel B-Tree index creation) |