Re: Very big insert/join performance problem (bacula)

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Devin Ben-Hur <dbenhur(at)whitepages(dot)com>, Marc Cousin <cousinmarc(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Very big insert/join performance problem (bacula)
Date: 2009-07-16 00:43:19
Message-ID: C683C5B7.A2FC%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 7/15/09 4:56 PM, "Devin Ben-Hur" <dbenhur(at)whitepages(dot)com> wrote:

> Marc Cousin wrote:
>> This mail contains the asked plans :
>> Plan 1
>> around 1 million records to insert, seq_page_cost 1, random_page_cost 4
>
>> -> Hash (cost=425486.72..425486.72 rows=16746972 width=92) (actual
>> time=23184.196..23184.196 rows=16732049 loops=1)
>> -> Seq Scan on path (cost=0.00..425486.72 rows=16746972
>> width=92) (actual time=0.004..7318.850 rows=16732049 loops=1)
>
>> -> Hash (cost=1436976.15..1436976.15 rows=79104615 width=35) (actual
>> time=210831.840..210831.840 rows=79094418 loops=1)
>> -> Seq Scan on filename (cost=0.00..1436976.15 rows=79104615
>> width=35) (actual time=46.324..148887.662 rows=79094418 loops=1)
>
> This doesn't address the cost driving plan question, but I think it's a
> bit puzzling that a seq scan of 17M 92-byte rows completes in 7 secs,
> while a seqscan of 79M 35-byte rows takes 149secs. It's about 4:1 row
> ratio, less than 2:1 byte ratio, but a 20:1 time ratio. Perhaps there's
> some terrible bloat on filename that's not present on path? If that seq
> scan time on filename were proportionate to path this plan would
> complete about two minutes faster (making it only 6 times slower instead
> of 9 :).

Bloat is possible. This can be checked with VACUUM VERBOSE on the table.
Postgres has a habit of getting its table files fragmented too under certain
use cases.
Additionally, some of the table pages may have been cached in one use case
and not in another.
>
> --
> -Devin
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message ning 2009-07-16 00:53:59 Re: Repeated Query is much slower in PostgreSQL8.2.4 than DB2 9.1
Previous Message Scott Marlowe 2009-07-16 00:40:02 Re: Poor overall performance unless regular VACUUM FULL