From: | Valerie Schneider DSI/DEV <Valerie(dot)Schneider(at)meteo(dot)fr> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org |
Cc: | Valerie(dot)Schneider(at)meteo(dot)fr |
Subject: | Tuning queries on large database |
Date: | 2004-08-04 12:44:43 |
Message-ID: | 200408041244.i74CihO19344@mu.meteo.fr |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-performance |
Hi,
I have some problem of performance on a PG database, and I don't
know how to improve. I Have two questions : one about the storage
of data, one about tuning queries. If possible !
My job is to compare Oracle and Postgres. All our operational databases
have been running under Oracle for about fifteen years. Now I try to replace
Oracle by Postgres.
I have a test platform under linux (Dell server, 4 Gb RAM, bi-processor,
Linux Red Hat 9 (2.4.20-31.9)) with 2 databases, 1 with Oracle
(V8i or V9i it's quite the same), 1 with PG (7.4.2). Both databases
have the same structure, same content, about 100 Gb each. I developped
some benches, representative of our use of databases. My problem
is that I have tables (relations) with more than 100 millions rows,
and each row has about 160 fields and an average size 256 bytes.
For Oracle I have a SGA size of 500 Mb.
For PG I have a postgresql.conf as :
max_connections = 1500
shared_buffers = 30000
sort_mem = 50000
effective_cache_size = 200000
and default value for other parameters.
I have a table named "data" which looks like this :
bench=> \d data
Table "public.data"
Column | Type | Modifiers
------------+-----------------------------+-----------
num_poste | numeric(9,0) | not null
dat | timestamp without time zone | not null
datrecu | timestamp without time zone | not null
rr1 | numeric(5,1) |
qrr1 | numeric(2,0) | ...
... all numeric fields
...
Indexes:
"pk_data" primary key, btree (num_poste, dat)
"i_data_dat" btree (dat)
It contains 1000 different values of "num_poste" and for each one
125000 different values of "dat" (1 row per hour, 15 years).
I run a vacuum analyze of the table.
bench=> select * from tailledb ;
schema | relfilenode | table | index | reltuples | size
--------+-------------+------------------+------------+-------------+----------
public | 125615917 | data | | 1.25113e+08 | 72312040
public | 251139049 | data | i_data_dat | 1.25113e+08 | 2744400
public | 250870177 | data | pk_data | 1.25113e+08 | 4395480
My first remark is that the table takes a lot of place on disk, about
70 Gb, instead of 35 Gb with oracle.
125 000 000 rows x 256 b = about 32 Gb. This calculation gives an idea
not so bad for oracle. What about for PG ? How data is stored ?
The different queries of the bench are "simple" queries (no join,
sub-query, ...) and are using indexes (I "explained" each one to
be sure) :
Q1 select_court : access to about 700 rows : 1 "num_poste" and 1 month
(using PK : num_poste=p1 and dat between p2 and p3)
Q2 select_moy : access to about 7000 rows : 10 "num_poste" and 1 month
(using PK : num_poste between p1 and p1+10 and dat between p2 and p3)
Q3 select_long : about 250 000 rows : 2 "num_poste"
(using PK : num_poste in (p1,p1+2))
Q4 select_tres_long : about 3 millions rows : 25 "num_poste"
(using PK : num_poste between p1 and p1 + 25)
The result is that for "short queries" (Q1 and Q2) it runs in a few
seconds on both Oracle and PG. The difference becomes important with
Q3 : 8 seconds with oracle
80 sec with PG
and too much with Q4 : 28s with oracle
17m20s with PG !
Of course when I run 100 or 1000 parallel queries such as Q3 or Q4,
it becomes a disaster !
I can't understand these results. The way to execute queries is the
same I think. I've read recommended articles on the PG site.
I tried with a table containing 30 millions rows, results are similar.
What can I do ?
Thanks for your help !
********************************************************************
* Les points de vue exprimes sont strictement personnels et *
* n'engagent pas la responsabilite de METEO-FRANCE. *
********************************************************************
* Valerie SCHNEIDER Tel : +33 (0)5 61 07 81 91 *
* METEO-FRANCE / DSI/DEV Fax : +33 (0)5 61 07 81 09 *
* 42, avenue G. Coriolis Email : Valerie(dot)Schneider(at)meteo(dot)fr *
* 31057 TOULOUSE Cedex - FRANCE http://www.meteo.fr *
********************************************************************
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2004-08-04 13:06:54 | Re: [PERFORM] Tuning queries on large database |
Previous Message | terry | 2004-08-04 12:07:55 | Re: trash talk |
From | Date | Subject | |
---|---|---|---|
Next Message | Paul Serby | 2004-08-04 12:45:55 | The black art of postgresql.conf tweaking |
Previous Message | Ulrich Wisser | 2004-08-04 12:00:39 | How to know which queries are to be optimised? |