Re: Massive performance issues

From: Matthew Sackman <matthew(at)lshift(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Massive performance issues
Date: 2005-09-01 21:13:59
Message-ID: 20050901211359.GD7131@pongo.lshift.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Sep 01, 2005 at 10:54:45PM +0200, Arjen van der Meijden wrote:
> On 1-9-2005 19:42, Matthew Sackman wrote:
> >Obviously, to me, this is a problem, I need these queries to be under a
> >second to complete. Is this unreasonable? What can I do to make this "go
> >faster"? I've considered normalising the table but I can't work out
> >whether the slowness is in dereferencing the pointers from the index
> >into the table or in scanning the index in the first place. And
> >normalising the table is going to cause much pain when inserting values
> >and I'm not entirely sure if I see why normalising it should cause a
> >massive performance improvement.
>
> In this case, I think normalising will give a major decrease in on-disk
> table-size of this large table and the indexes you have. If that's the
> case, that in itself will speed-up all i/o-bound queries quite a bit.

Well that's the thing - on the queries where it decides to use the index
it only reads at around 3MB/s and the CPU is maxed out, whereas when it
doesn't use the index, the disk is being read at 60MB/s. So when it
decides to use an index, I don't seem to be IO bound at all. Or at least
that's the way it seems to me.

> locality_1, _2, city and county can probably be normalised away without
> much problem, but going from varchar's to integers will probably safe
> you quite a bit of (disk)space.

Sure, that's what I've been considering today.

> But since it won't change the selectivity of indexes, so you won't get
> more index-scans instead of sequential scans, I suppose.
> I think its not that hard to create a normalized set of tables from this
> data-set (using insert into tablename select distinct ... from address
> and such, insert into address_new (..., city) select ... (select cityid
> from cities where city = address.city) from address)
> So its at least relatively easy to figure out the performance
> improvement from normalizing the dataset a bit.

Yeah, the initial creation isn't too painful but when adding rows into
the address table it gets more painful. However, as I've said elsewhere,
the import isn't the critical path so I can cope with that pain,
possibly coding around it in a stored proceedure and triggers as
suggested.

> If you want to improve your hardware, have a look at the Western Digital
> Raptor-series SATA disks, they are fast scsi-like SATA drives. You may
> also have a look at the amount of memory available, to allow caching
> this (entire) table.

Well I've got 1GB of RAM, but from analysis of its use, a fair amount
isn't being used. About 50% is actually in use by applications and about
half of the rest is cache and the rest isn't being used. Has this to do
with the max_fsm_pages and max_fsm_relations settings? I've pretty much
not touched the configuration and it's the standard Debian package.

Matthew

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Frost 2005-09-01 21:26:47 Re: Massive performance issues
Previous Message Matthew Sackman 2005-09-01 21:06:26 Re: Massive performance issues