From: | Hannu Krosing <hannu(at)krosing(dot)net> |
---|---|
To: | Jean-Paul ARGUDO <jean-paul(dot)argudo(at)IDEALX(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, sdinot(at)IDEALX(dot)com, dbarth(at)IDEALX(dot)com |
Subject: | Re: Oracle vs PostgreSQL in real life : NEWS!!! |
Date: | 2002-03-02 22:21:56 |
Message-ID: | 1015107716.3512.89.camel@rh72.home.ee |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 2002-03-01 at 23:44, Jean-Paul ARGUDO wrote:
> > > Oracle on NT 4 : 45 minuts to go , 1200 tps (yes one thousand and two hundred
> > > tps)
> > >
> > > Linux Red Hat 7.2 with PostgreSQL 7.2 : hours to go (statistically, 45 hours),
> > > 80 tps (eighty tps).
>
> Well... Where to start?
>
> We work on a team of two. The other one is a C/C++ senior coder. He
> mailed me a remark about datatypes on the database. Here is what he sent
> me:
>
> Our database has different datatypes, here are a count of distinct
> datatypes in all tables:
>
> 197 numeric(x)
> 19 numeric(x,2)
> 2 varchar(x)
> 61 char(x)
> 36 datetime
>
> He asked me about numeric(x) and he questioned my about how PG managed
> the NUMERIC types.
>
> I gave him a pointer on "numeric.c" in the PG srcs.
>
> I analyzed this source and found that NUMERIC types are much most
> expensive than simple INTEGER.
>
> I really fall on the floor.. :-( I was sure with as good quality PG is,
> when NUMERIC(x) columns are declared, It would be translated in INTEGER
> (int2, 4 or 8, whatever...).
Postgres does not do any silent type replacing based on data type max
length.
> So, I made a pg_dump of the current database, made some perl
> remplacements NUMERIC(x,0) to INTEGER.
>
> I loaded the database and launched treatments: the results are REALLY
> IMPRESIVE: here what I have:
>
> ((it is a port of Oracle/WinNT stuff to PostgreSQL/Red Hat 7.2)):
>
> Oracle PG72 with NUMERICs PG72 with INTEGERS
> --------------------------------------------------------------------------
> sample
> connect by
> query ported 350ms 750ms 569ms
> to PG
> (thanks to
> OpenACS code!)
Did you rewrite your CONNECT BY queries as recursive functions or did
you use varbit tree position pointers ?
> --------------------------------------------------------------------------
> sample "big"
> query with
> connect bys 3 min 30s 8 min 40s 5 min 1s
> and many
> sub-queries
Could you give more information on this query - i suspect this can be
made at least as fast as Oracle :)
> --------------------------------------------------------------------------
> Big Batch
> treatment 1300 queries/s 80 queries/s 250 queries/s
> queries
>
> PRO*C to 45 min to go ~4 to 6 DAYS not yet
> ECPG to go tested fully
>
> Ratio 1:1 1:21 not yet ..
> 21 times slower!
Did you run concurrent vacuum for both PG results ?
From my limited testing it seems that such vacuum is absolutely needed
for big batches of mostly updates.
And btw 45min * 21 = 15h45 not 4-6 days :)
> --------------------------------------------------------------------------
> ((but this batch will be yet re-writen in pure C + libpq + SPI,
> so we think we'll have better results again))
You probably will get better results :)
I rerun my test (5000 transactions of 20 updates on random unique key
between 1 and 768, with concurrent vacuum running every 4 sec) moving
the inner loop of 20 random updates to server, both without SPI prepared
statements and then using prepared statements.
Test hardware - Athlon 859, IDE, 512MB ram
update of random row i=1..768
all queries sent from client 2:02 = 820 updates sec
[hannu(at)taru abench]$ time ./abench.py
real 2m1.522s
user 0m20.260s
sys 0m3.320s
[hannu(at)taru abench]$ time ./abench.py
real 2m2.320s
user 0m19.830s
sys 0m3.490s
using plpython without prepared statements 1:35 = 1052 updates/sec
[hannu(at)taru abench]$ time ./abenchplpy2.py
real 1m34.587s
user 0m1.280s
sys 0m0.400s
[hannu(at)taru abench]$ time ./abenchplpy2.py
real 1m36.919s
user 0m1.350s
sys 0m0.450s
using plpython with SPI prepared statements 1:06.30 = 1503 updates/sec
[hannu(at)taru abench]$ time ./abenchplpy.py
real 1m6.134s
user 0m1.400s
sys 0m0.720s
[hannu(at)taru abench]$ time ./abenchplpy.py
real 1m7.186s
user 0m1.580s
sys 0m0.570s
plpython non-functional with SPI prepared
statements - update where i=1024 0:17.65 = 5666 non-updates sec
[hannu(at)taru abench]$ time ./abenchplpy.py
real 0m17.650s
user 0m0.990s
sys 0m0.290s
> So as you see, DATA TYPES are REALLY important, as I did write on a
> techdocs article ( I should have tought in this earlier )
Yes they are.
But running concurrent vacuum is _much_ more important if the number of
updates is much bigger than number of records (thanks Tom!)
------------------
Hannu
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2002-03-02 23:00:43 | Re: elog() patch |
Previous Message | Pavlo Baron | 2002-03-02 20:01:44 | Re: TODO question |