From: | Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> |
---|---|
To: | Luke Lonergan <llonergan(at)greenplum(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: random observations while testing with a 1,8B row table |
Date: | 2006-03-10 19:48:23 |
Message-ID: | 4411D807.1040903@kaltenbrunner.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Luke Lonergan wrote:
> Stefan,
>
> On 3/10/06 9:40 AM, "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc> wrote:
>
>
>>I will summarize some of the just in case somebody is interested:
>
>
> I am!
heh - not surprised :-)
>
>
>>-> table used has 5 integer columns non-indexed during the loads
>>-> hardware is a Dual Opteron 280 with 4 cores(at)2,4GHz and 16GB RAM, data
>>is on a multipathed (busy) SAN with different (RAID 10) Arrays for WAL
>>and data.
>
>
> How many connections out of the machine? How many disks behind each LUN?
2 HBAs in the server, 2x2 possible paths to each LUN.
6 disks for the WAL and 12 disks for the data
>
> So - about 20 Bytes per row (5*4) unless those are int8, but on disk it's
> 108GB/1.8B = 60 Bytes per row on disk. I wonder what all that overhead is?
>
>
>>1. data loading - I'm using COPY with batches of 300M rows it takes
>>
>>*) with one copy running it takes about 20minutes/batch to load the data
>>(~250k rows/sec) and virtually no context switches.
>>
>>*) with two copys running concurrently it takes a bit less then 30
>>minutes/batch and a steady 40k context switches/sec (~340k rows/sec overall)
>>
>>*) with three copy it takes about 40min/batch at 140k context
>>switches/sec (380k rows/sec overall)
>
>
> So, from 15 MB/s up to about 20 MB/s.
>
>
>
>>while the amount of IO going on is quite a lot it looks like we are
>>still mostly CPU-bound for COPY.
>
>
> It's what we see almost always. In this case if your I/O configuration is
> capable of performing at about 3x the 20MB/s max parsing rate, or 60MB/s,
> you will be CPU limited.
the IO-System I use should be capable of doing that if pushed hard
enough :-)
>
> The 3x is approximate, and based on observations, the reasoning underneath
> it is that Postgres is writing the data several times, once to the WAL, then
> from the WAL to the heap files.
>
>
>>2. updating all of the rows in the table:
>>
>>I updated all of the rows in the table with a simple UPDATE testtable
>>set a=a+1;
>>this took about 2,5 hours (~200rows/sec)
>
>
> Ugh. This is where Bizgres MPP shines, I'll try to recreate your test and
> post results. This scales linearly in Bizgres MPP with the number of disks
> and CPUs available, but I would hope for much more than that.
interesting to know, but still I'm testing/playing with postgresql here
not bizgres MPP ...
>
>
>>3. vacuuming this table - it turned out that VACUUM FULL is completly
>>unusable on a table(which i actually expected before) of this size not
>>only to the locking involved but rather due to a gigantic memory
>>requirement and unbelievable slowness.
>
>
> Simple vacuum should be enough IMO.
sure, that was mostly meant as an experiment, if I had to do this on a
production database I would most likely use CLUSTER to get the desired
effect (which in my case was purely getting back the diskspace wasted by
dead tuples)
Stefan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2006-03-10 19:54:10 | Re: random observations while testing with a 1,8B row table |
Previous Message | Marc G. Fournier | 2006-03-10 19:16:10 | Re: PostgreSQL Anniversary Summit, Call for Contributions |