Quick Links

Re: Fast Inserts and Hardware Questions

From:	Richard H <dev(at)archonet(dot)com>
To:	Orion Henry <orion(at)trustcommerce(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Fast Inserts and Hardware Questions
Date:	2001-03-14 20:35:34
Message-ID:	20010314.20353400@client.archonet.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 3/14/01, 7:08:48 PM, Orion Henry <orion(at)trustcommerce(dot)com> wrote
regarding [GENERAL] Fast Inserts and Hardware Questions:

Please bear in mind that you are operating well above anything I do, so
I'm not speaking from experience.

> I am specing out a database for my company and I need some advice. The
> database in mind will consist of one table, with about 300 bytes per
> record. The table will be getting two or three million inserts a week
> and it would be nice if it could handle a sustained 30 to 50 a second.
> The table will have (ACK!) 4 indexes (if anyone can explain to me how I
> can get away with less please let me know)

Three million inserts a week is about 5 per second, so you want to
sustain 10 times the average rate (sounds reasonable). Have you
considered separating inserts from reads? The inserts wouldn't even need
to go into a database initially, just log them to a file and feed them in
at a steady 10 per second (allows for downtime).

> The indexes will be
> int8 (primary key)
> int4 (group number)
> timestamp (creation date)
> int4 (customer id)

> The customers want to be able to query their data whenever and have it
> be snappy.

And they don't see why they need to pay so much for something so simple,
either ;-)

> So here is my question:
> * Is there an OS that is best suited for postgres. All things being
> equal I would like to run this on Linux.

Can't comment - I've only used PG on Linux. People who use the various
xxxBSDs swear by them (rather than at them) and Solaris has some
excellent support (at a cost)

Of course, you could try MS-Access on Win-ME ;-)

> * Is there an architecture best suited for this. Should I shell out the
> extra cash for a 64 bit box over a 32bit one. Quad Xeon or Quad Alpha?
> Quad Alpha or Quad UltraSparc?

Again heresay, but Sun's hardware has excellent I/O throughput and it's
difficult to get people to abandon it in favour of Intel once they've
tried it.

> * Since most of what I am doing is inserts I will assume that the disk
> will be my bottleneck over anything else. Is this correct?

Yes.

> * Will the 7.1 WAL save me, when it comes to insert times?
> * I read something about disabling fsync() to speed up inserts. How
> much will this speed things up? I would consider it as I COULD rebuild
> lost data from my logs in the event of a system crash and one night a
> year of panic on my part is worth saving $100,000 in drive arrays.

Turning fsync off gave about a factor of 10 increase (!) before 7.1 - the
WAL stuff means you can do so safely. Basically PG should rebuild itself
from the WAL file in the event of system failure. Have to admit I've not
tried it myself yet.

> Oh, and if any of you SQL guru's are still reading I'll show you the
> queries I will be running to see if I can really ditch an index.

> select * from table where customer_id = ? and primary_key = ?::int8

Well if primary_key is one, you don't need customer_id.

> select * from table where customer_id = ? and group_number = ?
> select * from table where customer_id = ? and creation > ? and creation
> < ?

How many customers do you have? Are the customers going to want a
specific time period? If so, you could have one table per week (say) and
use a view to conceal the fact for the occasional query that crosses week
boundaries.

If these are actually going to by GROUP BY queries totalling figures,
perhaps try calculating totals beforehand.

If these are representative of your main queries, I'd be tempted to buy
10 cheap machines (+ a spare) and split the customers among the machines.
Proxy this stuff at the application level and they'll never know. For
your management stats you'd have to write a script to summarise stuff
from several machines, but that shouldn't be too difficult. The spare
machine can be over-specced and have the data from all the other ready
for a hot-swapover.

The beauty of multiple machines is it should scale well (so long as you
are sufficiently fascist about keeping the configs the same).

> Thanks for all your help,

> Orion Henry

Hope it was

- Richard Huxton

In response to

Fast Inserts and Hardware Questions at 2001-03-14 19:08:48 from Orion Henry

Responses

Re: Fast Inserts and Hardware Questions at 2001-03-15 00:34:09 from Orion Henry
Re: Fast Inserts and Hardware Questions at 2001-03-15 02:18:21 from Bruce Momjian

Browse pgsql-general by date

	From	Date	Subject
Next Message	keith	2001-03-14 20:44:24	Fw: Please Help
Previous Message	Jan Ploski	2001-03-14 20:26:46	Re: Compiling 7.1 JDBC driver