Re: Windowing Function Patch Review -> Performance Comparison.

From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com, sitnikov(dot)vladimir(at)gmail(dot)com
Subject: Re: Windowing Function Patch Review -> Performance Comparison.
Date: 2008-11-17 05:29:09
Message-ID: e08cc0400811162129h424d8dbcp451382270ce11650@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2008/11/15 David Rowley <dgrowley(at)gmail(dot)com>:
> Hitoshi Harada wrote:
>> > david=# explain select date,lag(date,1) over (order by date) from
>> > meter_Readings order by date;
>> > QUERY PLAN
>> > ------------------------------------------------------------------------
>> ----
>> > --------------------------------
>> > Sort (cost=1038.73..1063.74 rows=10001 width=4)
>> > Sort Key: date
>> > -> Window (cost=0.00..374.27 rows=10001 width=4)
>> > -> Index Scan using meter_readings_pkey on meter_readings
>> > (cost=0.00..299.27 rows=10001 width=4)
>> > (4 rows)
>> >
>> > Is the final sort still required? Is it not already sorted in the
>> window?
>> >
>>
>> Oh, I forgot to mention about it. This behavior is also fixed and it
>> works without sort on the window now. I don't remember at all why I
>> did so and there's no comment around that but regression tests showed
>> there is no preblem without it.
>>
>> Regards,
>>
>> --
>> Hitoshi Harada
>
> Fantastic news! That will speed up the few test queries I have quite a bit
> the sort was splitting to disk so performance was dropping quite a bit. This
> might be a good time to say that the hardware that I'm testing on is ultra
> slow. 600 Mhz Pentium III with only 28MB shared buffers.
>
> I've done some more performance tests with the patch. Realising that I
> really need to be comparing the performance with something else I decided to
> have a query process a large table with then without a windowing function
> just to see how much the query slows. Of course there is going to be an
> overhead using a tuple store. I just wanted to see how much. My results are
> not really very interesting, so I'll just include them in the bottom of the
> email for those who want to see.

Thanks for your test code. I attach the result of your test to which
another query is added. The added row_number() query has row buffer
strategy that doesn't hold tuplestore but only scans and returns rows.
So the difference between row_number(), 44 sec, and ntile(), 61 sec,
cases roughly shows how much tuplestore adds overhead. I had supposed
the row_number() case would be more efficient but still we can see it
works as an optimization.

> Casting my mind back to when the patch was always doing a sort before a
> window with an order by even when a btree index was there, it's really come
> a long way. I've little idea how difficult it would be to implement better
> planning for the following. I suppose if it's difficult then it's probably
> better to wait for the patch to be commited first, or maybe it's something
> for 8.5.

Yeah, the planner around group by, order, distinct, indexscan, and
window is quite complicated. Though I think I can manage to do it if
there left enough time, but at first the basic design should be
qualified by core hackers. I am waiting for subsequent review and
responses from Heikki and others.

> SELECT department,
> SUM(Salary),
> ROW_NUMBER() OVER (ORDER BY department),
> SUM(SUM(salary)) OVER (ORDER BY department DESC)
> FROM employees
> GROUP BY department ORDER BY department;
>
> This query performs more sorts than really is needed. I suppose the most
> efficient way to process it would be to process the 2nd window first then
> the 1st before returning the results in the same order as the 1st window.
>
> Currently the plan looks like:
>
> QUERY PLAN
> ----------------------------------------------------------------------------
> -----------------
> Sort (cost=1.33..1.34 rows=3 width=9)
> Sort Key: department
> -> Window (cost=1.25..1.31 rows=3 width=9)
> -> Sort (cost=1.25..1.26 rows=3 width=9)
> Sort Key: department
> -> Window (cost=1.17..1.23 rows=3 width=9)
> -> Sort (cost=1.17..1.18 rows=3 width=9)
> Sort Key: department
> -> HashAggregate (cost=1.10..1.15 rows=3
> width=9)
> -> Seq Scan on employees (cost=0.00..1.06
> rows=6 width=9)
>
>
> Maybe it's possible to sort the processing order of the windows based on the
> ORDER BY clauses putting any that match the ORDER BY of the final results
> last. I've not looked into this in much detail. Currently I cannot see any
> scenario where it would be bad.
>
> What do you think?
>

The patch doesn't have infrastracture to relocate each window node
yet. When relocating them we must re-number wfunc->winref so it's a
bit work, though possible. Also, I can imagine to fix only above case
but I'm afraid without having great overall sketch of planner
side-effect would come up... For intance, what if the window query is
a subquery of group with sort strategy that assumes its subplan
doesn't sort. Maybe in that case upper group by doesn't notice the
sort key change done by window node relocating in subplan.

Regards,

--
Hitoshi Harada

$ uname -srpio
Linux 2.6.9-55.0.9.ELsmp i686 i386 GNU/Linux

# cpuinfo
Intel(R) Xeon(R) CPU X3210 @ 2.13GHz

sample=# explain analyze select id, timestamp from bigtable order by id;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using bigtable_pkey on bigtable (cost=0.00..286808.13
rows=10000000 width=12) (actual time=0.021..11996.336 rows=10000000
loops=1)
Total runtime: 20924.200 ms
(2 rows)

sample=# explain analyze select id, row_number() OVER (order by id)
from bigtable order by id;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Window (cost=0.00..361808.13 rows=10000000 width=4) (actual
time=0.080..35265.786 rows=10000000 loops=1)
-> Index Scan using bigtable_pkey on bigtable
(cost=0.00..286808.13 rows=10000000 width=4) (actual
time=0.075..13279.216 rows=10000000 loops=1)
Total runtime: 44067.815 ms
(3 rows)

sample=# explain analyze select id,LAG(timestamp,1) over (order by id)
from bigtable order by id;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Window (cost=0.00..411808.13 rows=10000000 width=12) (actual
time=30240.715..70439.057 rows=10000000 loops=1)
-> Index Scan using bigtable_pkey on bigtable
(cost=0.00..286808.13 rows=10000000 width=12) (actual
time=0.077..15302.919 rows=10000000 loops=1)
Total runtime: 79658.314 ms
(3 rows)

sample=# explain analyze select id, ntile(10) OVER (order by id) from
bigtable order by id;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Window (cost=0.00..386808.13 rows=10000000 width=4) (actual
time=25158.467..52250.418 rows=10000000 loops=1)
-> Index Scan using bigtable_pkey on bigtable
(cost=0.00..286808.13 rows=10000000 width=4) (actual
time=0.075..13088.061 rows=10000000 loops=1)
Total runtime: 61275.279 ms
(3 rows)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bramandia Ramadhana 2008-11-17 06:08:57 Re: Stack trace
Previous Message Devrim GÜNDÜZ 2008-11-17 05:28:19 Re: [HACKERS] pgsql: Enable script to generate preproc.y in build process.