From: | Hannu Krosing <hannu(at)tm(dot)ee> |
---|---|
To: | "Tille, Andreas" <TilleA(at)rki(dot)de> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Serious performance problem |
Date: | 2001-10-31 10:25:00 |
Message-ID: | 3BDFD17C.C78B35E@tm.ee |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
"Tille, Andreas" wrote:
>
> Hello,
>
> I discussed a problem concerning the speed of PostgreSQL compared to
> MS SQL server heavily on postgres-general list. The thread starts with
> message
>
> http://fts.postgresql.org/db/mw/msg.html?mid=1035557
>
> Now I tried a snapshot of version 7.2 and got an increase of speed of
> about factor 2. But sorry this is really not enough. The very simple
> test I pointed to in my mail is even much to slow and the issue would
> probably spoil down the whole project which should be a complete open
> source solution and would perhaps and in any M$ stuff. I´ve got under
> heavy preasur from my employer who was talking about the nice world
> of MS .net (while he is using MS-SQL exclusively). To make the thing
> clear the issue is the gal database of infectious diseases in Germany
> runned by the Robert Koch-Institute. So the beast could be of some
> importance for increasing the acceptance of PostgreSQL and Open Source
> in the field of medicine which is generally known for the money which
> is involved in. So I really hope that some skilled programmers would
> be able to find a good way to solve the performance issue perhaps by
> just profiling the simple query
>
> SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20))
> GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie;
>
> to the data set I put on
>
> http://www.foodborne-net.de/~tillea/ifsg/ifsgtest.dump.bz2
>
> If this should take less than half a second on a modern PC I could
> continue to try mo realistic queries.
I tried some more on optimizing the query on my work computer
(AMD ATHLON 850, 512MB, PostgreSQL 7.1.3 with default memory settings)
SELECT MeldeKategorie,
Count(ID) AS Anz
FROM Hauptdaten_Fall
WHERE IstAktuell=20
GROUP BY MeldeKategorie
ORDER BY MeldeKategorie;
real 0m9.675s
create index i1 on Hauptdaten_Fall(IstAktuell,MeldeKategorie);
----------------------------
set enable_seqscan = off;
SELECT MeldeKategorie,
Count(ID) AS Anz
FROM Hauptdaten_Fall
WHERE IstAktuell=20
GROUP BY MeldeKategorie
ORDER BY MeldeKategorie;
Aggregate (cost=4497.30..4510.18 rows=258 width=16)
-> Group (cost=4497.30..4503.74 rows=2575 width=16)
-> Sort (cost=4497.30..4497.30 rows=2575 width=16)
-> Index Scan using i1 on hauptdaten_fall
(cost=0.00..4351.40 rows=2575 width=16)
real 0m7.131s
---------------------------
set enable_seqscan = off;
SELECT MeldeKategorie,
Count(ID) AS Anz
FROM Hauptdaten_Fall
WHERE IstAktuell=20
GROUP BY IstAktuell,MeldeKategorie
ORDER BY IstAktuell,MeldeKategorie;
Aggregate (cost=4497.30..4510.18 rows=258 width=16)
-> Group (cost=4497.30..4503.74 rows=2575 width=16)
-> Index Scan using i1 on hauptdaten_fall (cost=0.00..4351.40
rows=2575 width=16)
real 0m3.223s
-- same after doing
cluster i1 on Hauptdaten_Fall;
real 1.590 -- 1.600
select count(*) from Hauptdaten_Fall;
real 0m0.630s
---------------------------
The following query is marginally (about 0.1 sec) faster, though the
plan looks the same down to cost estimates.
SET ENABLE_SEQSCAN = OFF;
SELECT MeldeKategorie,
Count(*) AS Anz
FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where
IstAktuell=20) sub
GROUP BY IstAktuell,MeldeKategorie
ORDER BY IstAktuell,MeldeKategorie;
Aggregate (cost=0.00..4370.72 rows=258 width=16)
-> Group (cost=0.00..4364.28 rows=2575 width=16)
-> Index Scan using i1 on hauptdaten_fall (cost=0.00..4351.40
rows=2575 width=16)
real 0m1.438s - 1.506s
---------------------------
now I make the dataset bigger keeping the number of rows returned by
query the same
insert into hauptdaten_fall (istaktuell, meldekategorie)
select istaktuell + 20, meldekategorie
from hauptdaten_fall ;
INSERT 0 257530
insert into hauptdaten_fall (istaktuell, meldekategorie)
select istaktuell + 40, meldekategorie
from hauptdaten_fall ;
INSERT 0 515060
ifsgtest=# select count(*) from hauptdaten_fall;
count
---------
1030120
(1 row)
cluster i1 on Hauptdaten_Fall;
vacuum analyze;
-- The query time is still the same 1.44 - 1.5 sec
SET ENABLE_SEQSCAN = OFF;
SELECT MeldeKategorie,
Count(*) AS Anz
FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where
IstAktuell=20) sub
GROUP BY IstAktuell,MeldeKategorie
ORDER BY IstAktuell,MeldeKategorie;
Aggregate (cost=0.00..4370.72 rows=258 width=16)
-> Group (cost=0.00..4364.28 rows=2575 width=16)
-> Index Scan using i1 on hauptdaten_fall (cost=0.00..4351.40
rows=2575 width=16)
real 0m1.438s - 1.506s
----------------------------
now back to original data distribution, just 4 times bigger
ifsgtest=# update hauptdaten_fall
ifsgtest-# set istaktuell = case when istaktuell % 20 = 0 then 20 else
10 end
ifsgtest-# ;
UPDATE 1030120
ifsgtest=# vacuum analyze;
VACUUM
SET ENABLE_SEQSCAN = OFF;
SELECT MeldeKategorie,
Count(*) AS Anz
FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where
IstAktuell=20) sub
GROUP BY IstAktuell,MeldeKategorie
ORDER BY IstAktuell,MeldeKategorie;
real 0m6.077 -- 6.606s
and after clustering:
cluster i1 on Hauptdaten_Fall;
real 0m5.683 - 5.750s
so it's linear growth here
----------------------------
Hannu
From | Date | Subject | |
---|---|---|---|
Next Message | mlw | 2001-10-31 13:14:02 | Re: Serious performance problem |
Previous Message | Marc G. Fournier | 2001-10-31 09:58:13 | Re: pgsql-committers? |