Re: TPC-R benchmarks

From: Oleg Lebedev <oleg(dot)lebedev(at)waterford(dot)org>
To: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: TPC-R benchmarks
Date: 2003-10-02 19:39:55
Message-ID: 993DBE5B4D02194382EC8DF8554A52731D7627@postoffice.waterford.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I was trying to get the pg_stats information to Josh and decided to
recreate the indexes on all my tables. After that I ran vacuum full
analyze, re-enabled nestloop and ran explain analyze on the query. It
ran in about 2 minutes.
I attached the new query plan. I am not sure what did the trick, but 2
minutes is much better than 2 hours. But then again, I can't take long
lunches anymore :)
Is there any way to make this query run even faster without increasing
the memory dedicated to postgres?
Thanks.

Oleg

-----Original Message-----
From: scott.marlowe [mailto:scott(dot)marlowe(at)ihs(dot)com]
Sent: Thursday, October 02, 2003 10:29 AM
To: Oleg Lebedev
Cc: Josh Berkus; pgsql-performance(at)postgresql(dot)org
Subject: RE: [PERFORM] TPC-R benchmarks

Have you tried increasing the statistics target for those columns that
are
getting bad estimates yet and then turning back on enable_nestloop and
rerunning analyze and seeing how the query does?

The idea being to try and get a good enough estimate of your statistics
so
the planner stops using nestloops on its own rather than forcing it to
with enable_nestloop = false.

On Thu, 2 Oct 2003, Oleg Lebedev wrote:

> As Scott recommended, I did the following:
> # set enable_nestloop = false;
> # vacuum full analyze;
>
> After this I re-ran the query and its execution time went down from 2
> hours to 2 minutes. I attached the new query plan to this posting. Is
> there any way to optimize it even further? What should I do to make
> this query run fast without hurting the performance of the other
> queries? Thanks.
>
> Oleg
>
> -----Original Message-----
> From: scott.marlowe [mailto:scott(dot)marlowe(at)ihs(dot)com]
> Sent: Wednesday, October 01, 2003 4:00 PM
> To: Oleg Lebedev
> Cc: Josh Berkus; pgsql-performance(at)postgresql(dot)org
> Subject: Re: [PERFORM] TPC-R benchmarks
>
>
> For troubleshooting, can you try it with "set enable_nestloop = false"

> and rerun the query and see how long it takes?
>
> It looks like the estimates of rows returned is WAY off (estimate is
> too
>
> low compared to what really comes back.)
>
> Also, you might try to alter the table.column to have a higher target
> on
>
> the rows p_partkey and ps_partkey and any others where the estimate is

> so far off of the reality.
>
> On Wed, 1 Oct 2003, Oleg Lebedev wrote:
>
> > All right, my query just finished running with EXPLAIN ANALYZE. I
> > show
>
> > the plan below and also attached it as a file. Any ideas?
> >
> > -> Sort (cost=54597.49..54597.50 rows=1 width=121) (actual
> > time=6674562.03..6674562.15 rows=175 loops=1)
> > Sort Key: nation.n_name, date_part('year'::text,
> > orders.o_orderdate)
> > -> Aggregate (cost=54597.45..54597.48 rows=1 width=121)
> > (actual time=6668919.41..6674522.48 rows=175 loops=1)
> > -> Group (cost=54597.45..54597.47 rows=3 width=121)

> > (actual time=6668872.68..6672136.96 rows=348760 loops=1)
> > -> Sort (cost=54597.45..54597.46 rows=3
> > width=121) (actual time=6668872.65..6669499.95 rows=348760 loops=1)
> > Sort Key: nation.n_name,
> > date_part('year'::text, orders.o_orderdate)
> > -> Hash Join (cost=54596.00..54597.42
> > rows=3
> > width=121) (actual time=6632768.89..6650192.67 rows=348760 loops=1)
> > Hash Cond: ("outer".n_nationkey =
> > "inner".s_nationkey)
> > -> Seq Scan on nation
> > (cost=0.00..1.25 rows=25 width=33) (actual time=6.75..7.13 rows=25
> > loops=1)
> > -> Hash (cost=54596.00..54596.00
> > rows=3
> > width=88) (actual time=6632671.96..6632671.96 rows=0 loops=1)
> > -> Nested Loop
> > (cost=0.00..54596.00 rows=3 width=88) (actual
time=482.41..6630601.46
> > rows=348760 loops=1)
> > Join Filter:
> > ("inner".s_suppkey = "outer".l_suppkey)
> > -> Nested Loop
> > (cost=0.00..54586.18 rows=3 width=80) (actual
time=383.87..6594984.40
> > rows=348760 loops=1)
> > -> Nested Loop
> > (cost=0.00..54575.47 rows=4 width=68) (actual
time=199.95..3580882.07
> > rows=348760 loops=1)
> > Join
Filter:
> > ("outer".p_partkey = "inner".ps_partkey)
> > -> Nested
> > Loop (cost=0.00..22753.33 rows=9343 width=49) (actual
> > time=146.85..3541433.10 rows=348760 loops=1)
> > ->
Seq
>
> > Scan on part (cost=0.00..7868.00 rows=320 width=4) (actual
> > time=33.64..15651.90 rows=11637 loops=1)
> >
> > Filter: (p_name ~~ '%green%'::text)
> > ->
> > Index Scan using i_l_partkey on lineitem (cost=0.00..46.15 rows=29
> > width=45) (actual time=10.71..302.67 rows=30 loops=11637)
> >
> > Index
> > Cond: ("outer".p_partkey = lineitem.l_partkey)
> > -> Index
> > Scan using pk_partsupp on partsupp (cost=0.00..3.39 rows=1
width=19)
> > (actual time=0.09..0.09 rows=1 loops=348760)
> > Index
> > Cond: ((partsupp.ps_partkey = "outer".l_partkey) AND
> > (partsupp.ps_suppkey =
> > "outer".l_suppkey))
> > -> Index Scan
> > using pk_orders on orders (cost=0.00..3.01 rows=1 width=12) (actual

> > time=8.62..8.62 rows=1 loops=348760)
> > Index Cond:

> > (orders.o_orderkey = "outer".l_orderkey)
> > -> Index Scan using
> > pk_supplier on supplier (cost=0.00..3.01 rows=1 width=8) (actual
> > time=0.08..0.08 rows=1 loops=348760)
> > Index Cond:
> > ("outer".ps_suppkey = supplier.s_suppkey) Total runtime: 6674724.23

> > msec (28 rows)
> >
> >
> > -----Original Message-----
> > From: Oleg Lebedev
> > Sent: Wednesday, October 01, 2003 12:00 PM
> > To: Josh Berkus; scott.marlowe
> > Cc: pgsql-performance(at)postgresql(dot)org
> > Subject: Re: [PERFORM] TPC-R benchmarks
> > Importance: Low
> >
> >
> > Sure, below is the query. I attached the plan to this posting.
> >
> > select
> > nation,
> > o_year,
> > sum(amount) as sum_profit
> > from
> > (
> > select
> > n_name as nation,
> > extract(year from o_orderdate) as o_year,
> > l_extendedprice * (1 - l_discount) -
> > ps_supplycost * l_quantity as amount
> > from
> > part,
> > supplier,
> > lineitem,
> > partsupp,
> > orders,
> > nation
> > where
> > s_suppkey = l_suppkey
> > and ps_suppkey = l_suppkey
> > and ps_partkey = l_partkey
> > and p_partkey = l_partkey
> > and o_orderkey = l_orderkey
> > and s_nationkey = n_nationkey
> > and p_name like '%green%'
> > ) as profit
> > group by
> > nation,
> > o_year
> > order by
> > nation,
> > o_year desc;
> >
> >
> > -----Original Message-----
> > From: Josh Berkus [mailto:josh(at)agliodbs(dot)com]
> > Sent: Wednesday, October 01, 2003 11:42 AM
> > To: Oleg Lebedev; scott.marlowe
> > Cc: pgsql-performance(at)postgresql(dot)org
> > Subject: Re: [PERFORM] TPC-R benchmarks
> >
> >
> > Oleg,
> >
> > > The output of the query should contain about 200 rows. So, I guess
> > > the
> >
> > > planer is off assuming that the query should return 1 row.
> >
> > Oh, also did you post the query before? Can you re-post it with
the
> > planner
> > results?
> >
> >
>
> *************************************
>
> This e-mail may contain privileged or confidential material intended
> for the named recipient only. If you are not the named recipient,
> delete this message and all attachments. Unauthorized reviewing,
> copying, printing, disclosing, or otherwise using information in this
> e-mail is prohibited. We reserve the right to monitor e-mail sent
> through our network.
>
> *************************************
>

*************************************

This e-mail may contain privileged or confidential material intended for the named recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network.

*************************************

Attachment Content-Type Size
new_plan_explained.txt text/text 3.4 KB

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message scott.marlowe 2003-10-02 19:44:12 Re: TPC-R benchmarks
Previous Message Bruno Wolff III 2003-10-02 19:39:05 Re: count(*) slow on large tables