Re: Speeding up query, Joining 55mil and 43mil records.

From: Sven Geisler <sgeisler(at)aeccom(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: nicky <nicky(at)valuecare(dot)nl>
Subject: Re: Speeding up query, Joining 55mil and 43mil records.
Date: 2006-06-21 15:37:07
Message-ID: 449967A3.3030801@aeccom.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Nicky,

I guess, you should try to upgrade the memory setting of PostgreSQL first.

work_mem = 65536

Is a bit low for such large joins.

Did you get a change to watch the directory
<PGDATA>/base/<DBOID>/pgsql_tmp to see how large the temporary file is
during this query. I'm sure that there is large file.

Anyhow, you can upgrade 'work_mem' to 1000000 which is 1 GB. Please note
that the parameter work_mem is per backend process. You will get
problems with multiple large queries at the same time.
You may move (link) the directory 'pgsql_tmp' to a very fast file system
if you still get large files in this directory.

You also can try to increase this settings:

checkpoint_segments = 256
checkpoint_timeout = 3600 # range 30-3600, in seconds
checkpoint_warning = 0 # 0 is off

Please read the PostgreSQL documentation about the drawbacks of this
setting as well as your setting 'fsync=off'.

Cheers
Sven.

nicky schrieb:
> Hello People,
>
> I'm trying to solve a 'what i feel is a' performance/configuration/query
> error on my side. I'm fairly new to configuring PostgreSQL so, i might
> be completely wrong with my configuration.
>
> My database consists of 44 tables, about 20GB. Two of those tables are
> 'big/huge'. Table src.src_faktuur_verricht contains 43million records
> (9GB) and table src.src_faktuur_verrsec contains 55million records (6GB).
>
> Below is the 'slow' query.
>
> INSERT INTO rpt.rpt_verrichting
> (verrichting_id
> ,verrichting_secid
> ,fout_status
> ,patientnr
> ,verrichtingsdatum
> ,locatie_code
> ,afdeling_code
> ,uitvoerder_code
> ,aanvrager_code
> ,verrichting_code
> ,dbcnr
> ,aantal_uitgevoerd
> ,kostenplaats_code
> ,vc_patientnr
> ,vc_verrichting_code
> ,vc_dbcnr
> )
> SELECT t1.id
> , t0.secid
> , t1.status
> , t1.patientnr
> , t1.datum
> , t1.locatie
> , t1.afdeling
> , t1.uitvoerder
> , t1.aanvrager
> , t0.code
> , t1.casenr
> , t0.aantal
> , t0.kostplaats
> , null
> , null
> , null
> FROM src.src_faktuur_verrsec t0 JOIN
> src.src_faktuur_verricht t1 ON
> t0.id = t1.id
> WHERE substr(t0.code,1,2) not in ('14','15','16','17')
> AND (substr(t0.correctie,4,1) <> '1' OR t0.correctie is null)
> AND EXTRACT(YEAR from t1.datum) > 2004;
>
>
> Output from explain
>
> Hash Join (cost=1328360.12..6167462.76 rows=7197568 width=118)
> Hash Cond: (("outer".id)::text = ("inner".id)::text)
>
> -> Seq Scan on src_faktuur_verrsec t0 (cost=0.00..2773789.90
> rows=40902852 width=52)
> Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND
> (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1,
> 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND
> ((substr((correctie)::text, 4, 1) <> '1'::text) OR (correctie IS NULL)))
> -> Hash (cost=1188102.97..1188102.97 rows=8942863 width=80)
> -> Bitmap Heap Scan on src_faktuur_verricht t1
> (cost=62392.02..1188102.97 rows=8942863 width=80)
> Recheck Cond: (date_part('year'::text, datum) >
> 2004::double precision)
> -> Bitmap Index Scan on src_faktuur_verricht_idx1
> (cost=0.00..62392.02 rows=8942863 width=0)
> Index Cond: (date_part('year'::text, datum) >
> 2004::double precision)
>
>
> The db server runs PostgreSQL 8.1.4 on FreeBSD 6.1-Stable. 2GB of RAM.
> It contains two SATA150 disks, one contains PostgreSQL and the rest of
> the operating system and the other disk holds the pg_xlog directory.
>
> Changed lines from my postgresql.conf file
>
> shared_buffers = 8192
> temp_buffers = 4096
> work_mem = 65536
> maintenance_work_mem = 1048576
> max_fsm_pages = 40000
> fsync = off
> wal_buffers = 64
> effective_cache_size = 174848
>
> The query above takes around 42 minutes.
>
> However, i also have a wimpy desktop machine with 1gb ram. Windows with
> MSSQL 2000 (default installation), same database structure, same
> indexes, same query, etc and it takes 17 minutes. The big difference
> makes me think that i've made an error with my PostgreSQL configuration.
> I just can't seem to figure it out.
>
> Could someone perhaps give me some pointers, advice?
>
> Thanks in advance.
>
> Nicky
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Ron St-Pierre 2006-06-21 15:37:51 Tuning New Server (slow function)
Previous Message jody brownell 2006-06-21 15:27:06 Re: Help tuning autovacuum - seeing lots of relationbloat