From: | "Shridhar Daithankar" <shridhar_daithankar(at)persistent(dot)co(dot)in> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org |
Subject: | Performance while loading data and indexing |
Date: | 2002-09-26 08:35:44 |
Message-ID: | 3D931438.22010.133ADAFA@localhost |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers pgsql-performance |
Hello all,
Some time back I posted a query to build a site with 150GB of database. In last
couple of weeks, lots of things were tested at my place and there are some
results and again some concerns.
This is a long post. Please be patient and read thr. If we win this, I guess we
have a good marketing/advocacy case here..;-)
First the problems (For those who do not read beyond first page)
1) Database load time from flat file using copy is very high
2) Creating index takes huge amount of time.
3) Any suggsestions for runtime as data load and query will be going in
parallel.
Now the details. Note that this is a test run only..
Platform:- 4x Xeon2.4GHz/4GB RAM/4x48 SCSI RAID5/72 GB SCSI
RedHat7.2/PostgreSQL7.1.3
Database in flat file:
125,000,000 records of around 100 bytes each.
Flat file size 12GB
Load time: 14581 sec/~8600 rows persec/~ an MB of data per sec.
Create unique composite index on 2 char and a timestamp field: 25226 sec.
Database size on disk: 26GB
Select query: 1.5 sec. for approx. 150 rows.
Important postgresql.conf settings
sort_mem = 12000
shared_buffers = 24000
fsync=true (Sad but true. Left untouched.. Will that make a difference on
SCSI?)
wal_buffers = 65536
wal_files = 64
Now the requirements
Initial flat data load: 250GB of data. This has gone up since last query. It
was 150GB earlier..
Ongoing inserts: 5000/sec.
Number of queries: 4800 queries/hour
Query response time: 10 sec.
Now questions.
1) Instead of copying from a single 12GB data file, will a parallel copy from
say 5 files will speed up the things?
Couple MB of data per sec. to disk is just not saturating it. It's a RAID 5
setup..
2) Sort mem.=12K i.e. 94MB, sounds good enough to me. Does this need further
addition to improve create index performance?
3) 5K concurrent inserts with an index on, will this need a additional CPU
power? Like deploying it on dual RISC CPUs etc?
4) Query performance is not a problem. Though 4.8K queries per sec. expected
response time from each query is 10 sec. But my guess is some serius CPU power
will be chewed there too..
5)Will upgrading to 7.2.2/7.3 beta help?
All in all, in the test, we didn't see the performance where hardware is
saturated to it's limits. So effectively we are not able to get postgresql
making use of it. Just pushing WAL and shared buffers does not seem to be the
solution.
If you guys have any suggestions. let me know. I need them all..
Mysql is almost out because it's creating index for last 17 hours. I don't
think it will keep up with 5K inserts per sec. with index. SAP DB is under
evaluation too. But postgresql is most favourite as of now because it works. So
I need to come up with solutions to problems that will occur in near future..
;-)
TIA..
Bye
Shridhar
--
Law of Procrastination: Procrastination avoids boredom; one never has the
feeling that there is nothing important to do.
From | Date | Subject | |
---|---|---|---|
Next Message | Shridhar Daithankar | 2002-09-26 08:54:02 | Re: [HACKERS] Performance while loading data and indexing |
Previous Message | Gaetano Mendola | 2002-09-26 08:07:15 | Re: rotatelog / logrotate with PostgreSQL |
From | Date | Subject | |
---|---|---|---|
Next Message | Hannu Krosing | 2002-09-26 08:52:48 | Re: unicode |
Previous Message | Manfred Koizar | 2002-09-26 08:14:23 | Re: Bug in PL/pgSQL GET DIAGNOSTICS? |
From | Date | Subject | |
---|---|---|---|
Next Message | Shridhar Daithankar | 2002-09-26 08:54:02 | Re: [HACKERS] Performance while loading data and indexing |
Previous Message | kopra | 2002-09-14 12:28:03 | subscribe |