Re: Hardware/OS recommendations for large databases (

From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc: "Greg Stark" <gsstark(at)mit(dot)edu>, "Joshua Marsh" <icub3d(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hardware/OS recommendations for large databases (
Date: 2005-11-18 15:25:52
Message-ID: A4D5EB2A-73BC-43F8-8B5A-36268193A047@fastcrypt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Luke,

Interesting numbers. I'm a little concerned about the use of blockdev
—setra 16384. If I understand this correctly it assumes that the
table is contiguous on the disk does it not ?

Dave
On 18-Nov-05, at 10:13 AM, Luke Lonergan wrote:

> Dave,
>
> On 11/18/05 5:00 AM, "Dave Cramer" <pg(at)fastcrypt(dot)com> wrote:
> >
> > Now there's an interesting line drawn in the sand. I presume you
> have
> > numbers to back this up ?
> >
> > This should draw some interesting posts.
>
> Part 2: The answer
>
> System A:
>> This system is running RedHat 3 Update 4, with a Fedora 2.6.10
>> Linux kernel.
>>
>> On a single table with 15 columns (the Bizgres IVP) at a size
>> double memory (2.12GB), Postgres 8.0.3 with Bizgres enhancements
>> takes 32 seconds to scan the table: that’s 66 MB/s. Not the
>> efficiency I’d hope from the onboard SATA controller that I’d
>> like, I would have expected to get 85% of the 100MB/s raw read
>> performance.
>>
>> So that’s $1,200 / 66 MB/s (without adjusting for 2003 price
>> versus now) = 18.2 $/MB/s
>>
>> Raw data:
>> [llonergan(at)kite4 IVP]$ cat scan.sh
>> #!/bin/bash
>>
>> time psql -c "select count(*) from ivp.bigtable1" dgtestdb
>> [llonergan(at)kite4 IVP]$ cat sysout1
>> count
>> ----------
>> 10000000
>> (1 row)
>>
>>
>> real 0m32.565s
>> user 0m0.002s
>> sys 0m0.003s
>>
>> Size of the table data:
>> [llonergan(at)kite4 IVP]$ du -sk dgtestdb/base
>> 2121648 dgtestdb/base
>>
> System B:
>> This system is running an XFS filesystem, and has been tuned to
>> use very large (16MB) readahead. It’s running the Centos 4.1
>> distro, which uses a Linux 2.6.9 kernel.
>>
>> Same test as above, but with 17GB of data takes 69.7 seconds to
>> scan (!) That’s 244.2MB/s, which is obviously double my earlier
>> point of 110-120MB/s. This system is running with a 16MB Linux
>> readahead setting, let’s try it with the default (I think) setting
>> of 256KB – AHA! Now we get 171.4 seconds or 99.3MB/s.
>>
>> So, using the tuned setting of “blockdev —setra 16384” we get
>> $6,000 / 244MB/s = 24.6 $/MB/s
>> If we use the default Linux setting it’s 2.5x worse.
>>
>> Raw data:
>> [llonergan(at)modena2 IVP]$ cat scan.sh
>> #!/bin/bash
>>
>> time psql -c "select count(*) from ivp.bigtable1" dgtestdb
>> [llonergan(at)modena2 IVP]$ cat sysout3
>> count
>> ----------
>> 80000000
>> (1 row)
>>
>>
>> real 1m9.875s
>> user 0m0.000s
>> sys 0m0.004s
>> [llonergan(at)modena2 IVP]$ !du
>> du -sk dgtestdb/base
>> 17021260 dgtestdb/base
>
> Summary:
>
> <cough, cough> OK – you can get more I/O bandwidth out of the
> current I/O path for sequential scan if you tune the filesystem for
> large readahead. This is a cheap alternative to overhauling the
> executor to use asynch I/O.
>
> Still, there is a CPU limit here – this is not I/O bound, it is CPU
> limited as evidenced by the sensitivity to readahead settings. If
> the filesystem could do 1GB/s, you wouldn’t go any faster than
> 244MB/s.
>
> - Luke

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Luke Lonergan 2005-11-18 15:27:42 Re: Hardware/OS recommendations for large databases (
Previous Message Luke Lonergan 2005-11-18 15:13:42 Re: Hardware/OS recommendations for large databases (