Re: ATTN: Clodaldo was Performance problem. Could it be related to 8.3-beta4?

From: Clodoaldo <clodoaldo(dot)pinto(dot)neto(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>, "PostgreSQL - General ML" <pgsql-general(at)postgresql(dot)org>
Subject: Re: ATTN: Clodaldo was Performance problem. Could it be related to 8.3-beta4?
Date: 2008-01-10 20:18:05
Message-ID: a595de7a0801101218q10b22c15p40ae5ef1b492aeb2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2008/1/10, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com> writes:
> > OK, I was looking at the previous thread that you thought had
> > disappeared, and with the explain analyze output from 8.3 I noticed
> > something odd.
>
> > For 8.2 you had something like this:
>
> > QUERY PLAN
> > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Subquery Scan "*SELECT*" (cost=326089.49..350310.28 rows=880756
> > width=20) (actual time=11444.566..13114.365 rows=880691 loops=1)
> > -> HashAggregate (cost=326089.49..339300.83 rows=880756 width=12)
> > (actual time=11444.554..12438.188 rows=880691 loops=1)
> > ... SNIP ...
> > Time: 357750.531 ms
>
> > And for 8.3 you had something like this:
> > QUERY PLAN
> > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Subquery Scan "*SELECT*" (cost=316145.48..340289.33 rows=877958
> > width=20) (actual time=10650.036..12997.377 rows=877895 loops=1)
> > -> HashAggregate (cost=316145.48..329314.85 rows=877958 width=12)
> > (actual time=10650.023..12193.890 rows=877895 loops=1)
> > ... SNIP ...
> > Time: 9547801.116 ms
>
> > In both of those instances, the actual time used is WAY larger than
> > the time listed in the explain analyze, which has usually pointed to a
> > very expensive OS level get time of day call.
>
> No, that's not it --- these last two runs are on the same OS.
>
> Remember this command is an INSERT/SELECT. The total actual time for
> the SELECT part of it (ie, forming the rows to insert) is what's
> reported as the top-level plan node's runtime --- so 8.3 is actually a
> shade faster than 8.2 here. And separately from that we can see the
> time spent in the one foreign-key trigger, which seems to be slower on
> 8.3, but neither of these are where the majority of the runtime is
> going. The rest of the runtime must be spent in the physical data
> insertion --- storing the rows, updating the indexes, and/or writing
> WAL. I don't see any particular reason for that to be so much worse in
> 8.3; especially not for such a simple table definition:
> http://archives.postgresql.org/pgsql-general/2008-01/msg00266.php
>
> There's something awfully weird going on there. I can replicate some
> slowdown from 8.2 to 8.3, which should be looked into, but it's not the
> order-of-magnitude-worse that Clodaldo is getting. I tried this simple
> experiment, which should be at least comparable to his table structure:
>
> create table foo (f1 int, f2 int, f3 int, f4 real);
> create index fooi on foo(f1);
> create index fooi2 on foo(f2);
> explain analyze insert into foo select i,i,0,1.0 from generate_series(1,1000000) i;
>
> and repeated the last a few times. On 8.2.6 I get numbers like this:
>
> regression=# explain analyze insert into foo select i,i,0,1.0 from generate_series(1,1000000) i;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------
> Function Scan on generate_series i (cost=0.00..12.50 rows=1000 width=4) (actual time=419.101..1408.868 rows=1000000 loops=1)
> Total runtime: 16783.096 ms
> (2 rows)
>
> regression=# explain analyze insert into foo select i,i,0,1.0 from generate_series(1,1000000) i;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------
> Function Scan on generate_series i (cost=0.00..12.50 rows=1000 width=4) (actual time=421.440..1512.787 rows=1000000 loops=1)
> Total runtime: 18785.652 ms
> (2 rows)
>
> 8.3RC1 gives numbers more like this:
>
> regression=# explain analyze insert into foo select i,i,0,1.0 from generate_series(1,1000000) i;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------
> Function Scan on generate_series i (cost=0.00..12.50 rows=1000 width=4) (actual time=420.742..1872.290 rows=1000000 loops=1)
> Total runtime: 17278.170 ms
> (2 rows)
>
> regression=# explain analyze insert into foo select i,i,0,1.0 from generate_series(1,1000000) i;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------
> Function Scan on generate_series i (cost=0.00..12.50 rows=1000 width=4) (actual time=435.392..1909.631 rows=1000000 loops=1)
> Total runtime: 20673.453 ms
> (2 rows)
>
> This is my usual debugging setup, built --enable-debug --enable-cassert
> but with "SET debug_assertions = 0" so that shouldn't affect things too much,
> no configuration changes from what initdb gave me except fsync = off
> (so it's not too representative of real-world performance, maybe).
>
> It would be interesting to see the identical test on Clodaldo's
> installations.

This is 8.2.6 in the new server:

cpn=> create table foo (f1 int, f2 int, f3 int, f4 real);
CREATE TABLE
cpn=> create index fooi on foo(f1);
CREATE INDEX
cpn=> create index fooi2 on foo(f2);
CREATE INDEX
cpn=> explain analyze insert into foo select i,i,0,1.0 from
generate_series(1,1000000) i;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Function Scan on generate_series i (cost=0.00..12.50 rows=1000
width=4) (actual time=270.425..699.067 rows=1000000 loops=1)
Total runtime: 12888.913 ms
(2 rows)

The table into which I'm inserting 800 thousand rows, usuarios, has
135 million rows so I did:

cpn=> explain analyze insert into foo select i,i,0,1.0 from
generate_series(1,135500000) i;

QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Function Scan on generate_series i (cost=0.00..12.50 rows=1000
width=4) (actual time=49852.161..403976.519 rows=135500000 loops=1)
Total runtime: 2044745.294 ms
(2 rows)
cpn=> analyze;
...warnings...
ANALYZE
cpn=> explain analyze insert into foo select i,i,0,1.0 from
generate_series(135500001, 135500000 + 800000 ) i;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Function Scan on generate_series i (cost=0.00..12.50 rows=1000
width=4) (actual time=196.804..553.617 rows=800000 loops=1)
Total runtime: 11202.895 ms
(2 rows)

The real table has foreign keys. Don't it make a difference?

I can reinstall 8.3 and try it. Will it be necessary or interesting?
beta4 or RC1?

Regards, Clodoaldo Pinto Neto

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andrej Ricnik-Bay 2008-01-10 20:40:44 Members-choice award at LinuxQuestions.org
Previous Message Jeff Davis 2008-01-10 20:02:14 Re: 8.3beta bug or feature?