Re: Reading data in bulk - help?

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Chris Huston <chuston(at)bangjafwac(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Reading data in bulk - help?
Date: 2003-09-11 02:50:08
Message-ID: 200309101950.08960.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Chris,

> 1) Memory - clumsily adjusted shared_buffer - tried three values: 64,
> 128, 256 with no discernible change in performance. Also adjusted,
> clumsily, effective_cache_size to 1000, 2000, 4000 - with no
> discernible change in performance. I looked at the Admin manual and
> googled around for how to set these values and I confess I'm clueless
> here. I have no idea how many kernel disk page buffers are used nor do
> I understand what the "shared memory buffers" are used for (although
> the postgresql.conf file hints that it's for communication between
> multiple connections). Any advice or pointers to articles/docs is
> appreciated.

You want values *much* higher than that. How much RAM do you have? See:
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

For example, if you have 512mb RAM, I'd crank up the shared buffers to 8000.
the sort_mem to 8mb, and the effective_cache_size to 24,000.

> 3) RAID - haven't tried it - but I'm guessing that the speed
> improvement from a RAID 5 may be on the order of 10x

Probably not ... more like 1.5x - 2.0x, but that's still a significant help,
yes? Also, the advantage will get better the more your data grows.

> - which I can
> likely get from using something like HDF.

HDF sucks for I/O speed. XServe will become a much more significant option
in the market when Apple can bring themselves to abandon HDF, and adopt XFS
or something. This is part of your problem.

> Since the data is unlikely to
> grow beyond 10-20gig, a fast drive and firewire ought to give me the
> performance I need.

Not sure about that. Is Firewire really faster for I/O than modern SCSI or
233mhz ATA? I don't do much Mac anymore, but I'd the impression that
Firewire was mainly for peripherals ....

What is important for your app in terms of speed is to get the data coming
from multiple drives over multiple channels. Were it a PC, I'd recommend a
motherboard with 4 IDE channels or Serial ATA, and spreading the data over 4
drives via RAID 0 or RAID 5, and adding dual processors. Then you could use
multiple postgres connections to read different parts of the table
simultaneously.

> I know experimentally that the current machine can
> sustain a 20MB/s transfer rate which is 20-30x the speed of these
> queries.

That is interesting. Adjust your PostgreSQL.conf and see what results you
get. It's possible that PostgreSQL is convinced that you have little or no
RAM because of your .conf settings, and is swapping stuff to temp file on
disk.

> 4) I'd previously commented out the output/writing steps from the app -
> to isolate read performance.

OK.

--
Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Christopher Browne 2003-09-11 03:07:12 Re: [osdldbt-general] Re: [GENERAL] how to get accurate
Previous Message Tom Lane 2003-09-11 02:44:17 Re: Attempt at work around of int4 query won't touch int8 index ...