Quick Links

Re: random observations while testing with a 1,8B row table

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: random observations while testing with a 1,8B row table
Date:	2006-03-10 19:48:23
Message-ID:	4411D807.1040903@kaltenbrunner.cc
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Luke Lonergan wrote:
> Stefan,
>
> On 3/10/06 9:40 AM, "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc> wrote:
>
>
>>I will summarize some of the just in case somebody is interested:
>
>
> I am!

heh - not surprised :-)

>
>
>>-> table used has 5 integer columns non-indexed during the loads
>>-> hardware is a Dual Opteron 280 with 4 cores(at)2,4GHz and 16GB RAM, data
>>is on a multipathed (busy) SAN with different (RAID 10) Arrays for WAL
>>and data.
>
>
> How many connections out of the machine? How many disks behind each LUN?

2 HBAs in the server, 2x2 possible paths to each LUN.
6 disks for the WAL and 12 disks for the data

>
> So - about 20 Bytes per row (5*4) unless those are int8, but on disk it's
> 108GB/1.8B = 60 Bytes per row on disk. I wonder what all that overhead is?
>
>
>>1. data loading - I'm using COPY with batches of 300M rows it takes
>>
>>*) with one copy running it takes about 20minutes/batch to load the data
>>(~250k rows/sec) and virtually no context switches.
>>
>>*) with two copys running concurrently it takes a bit less then 30
>>minutes/batch and a steady 40k context switches/sec (~340k rows/sec overall)
>>
>>*) with three copy it takes about 40min/batch at 140k context
>>switches/sec (380k rows/sec overall)
>
>
> So, from 15 MB/s up to about 20 MB/s.
>
>
>
>>while the amount of IO going on is quite a lot it looks like we are
>>still mostly CPU-bound for COPY.
>
>
> It's what we see almost always. In this case if your I/O configuration is
> capable of performing at about 3x the 20MB/s max parsing rate, or 60MB/s,
> you will be CPU limited.

the IO-System I use should be capable of doing that if pushed hard
enough :-)

>
> The 3x is approximate, and based on observations, the reasoning underneath
> it is that Postgres is writing the data several times, once to the WAL, then
> from the WAL to the heap files.
>
>
>>2. updating all of the rows in the table:
>>
>>I updated all of the rows in the table with a simple UPDATE testtable
>>set a=a+1;
>>this took about 2,5 hours (~200rows/sec)
>
>
> Ugh. This is where Bizgres MPP shines, I'll try to recreate your test and
> post results. This scales linearly in Bizgres MPP with the number of disks
> and CPUs available, but I would hope for much more than that.

interesting to know, but still I'm testing/playing with postgresql here
not bizgres MPP ...

>
>
>>3. vacuuming this table - it turned out that VACUUM FULL is completly
>>unusable on a table(which i actually expected before) of this size not
>>only to the locking involved but rather due to a gigantic memory
>>requirement and unbelievable slowness.
>
>
> Simple vacuum should be enough IMO.

sure, that was mostly meant as an experiment, if I had to do this on a
production database I would most likely use CLUSTER to get the desired
effect (which in my case was purely getting back the diskspace wasted by
dead tuples)

Stefan

In response to

Re: random observations while testing with a 1,8B row at 2006-03-10 19:10:48 from Luke Lonergan

Responses

Re: random observations while testing with a 1,8B row table at 2006-03-10 19:54:10 from Tom Lane
Re: random observations while testing with a 1,8B row at 2006-03-10 20:02:43 from Luke Lonergan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2006-03-10 19:54:10	Re: random observations while testing with a 1,8B row table
Previous Message	Marc G. Fournier	2006-03-10 19:16:10	Re: PostgreSQL Anniversary Summit, Call for Contributions