| From: | Orion Henry <orion(at)trustcommerce(dot)com> | 
|---|---|
| To: | dev(at)archonet(dot)com | 
| Cc: | pgsql-general(at)postgresql(dot)org | 
| Subject: | Re: Fast Inserts and Hardware Questions | 
| Date: | 2001-03-15 00:34:09 | 
| Message-ID: | 3AB00E01.6C4F7333@trustcommerce.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
Richard H wrote:
> 
> On 3/14/01, 7:08:48 PM, Orion Henry <orion(at)trustcommerce(dot)com> wrote
> regarding [GENERAL] Fast Inserts and Hardware Questions:
> 
> Please bear in mind that you are operating well above anything I do, so
> I'm not speaking from experience.
I appricate the advice in anycase.
 
> > I am specing out a database for my company and I need some advice.  The
> > database in mind will consist of one table, with about 300 bytes per
> > record.  The table will be getting two or three million inserts a week
> > and it would be nice if it could handle a sustained 30 to 50 a second.
> > The table will have (ACK!) 4 indexes (if anyone can explain to me how I
> > can get away with less please let me know)
> 
> Three million inserts a week is about 5 per second, so you want to
> sustain 10 times the average rate (sounds reasonable). Have you
> considered separating inserts from reads? The inserts wouldn't even need
> to go into a database initially, just log them to a file and feed them in
> at a steady 10 per second (allows for downtime).
Already done.  It's just that the less time between getting the data and
it being avaliable to the customers the better we look.
 
> > The indexes will be
> > int8 (primary key)
> > int4 (group number)
> > timestamp (creation date)
> > int4 (customer id)
> 
> > The customers want to be able to query their data whenever and have it
> > be snappy.
> 
> And they don't see why they need to pay so much for something so simple,
> either ;-)
Tell me about it ;)
 
> > So here is my question:
> > * Is there an OS that is best suited for postgres. All things being
> > equal I would like to run this on Linux.
> 
> Can't comment - I've only used PG on Linux. People who use the various
> xxxBSDs swear by them (rather than at them) and Solaris has some
> excellent support (at a cost)
>
> Of course, you could try MS-Access on Win-ME ;-)
Or... not. ;)
 
> > * Is there an architecture best suited for this.  Should I shell out the
> > extra cash for a 64 bit box over a 32bit one.  Quad Xeon or Quad Alpha?
> > Quad Alpha or Quad UltraSparc?
> 
> Again heresay, but Sun's hardware has excellent I/O throughput and it's
> difficult to get people to abandon it in favour of Intel once they've
> tried it.
What I was hoping to hear was something along the line of "An int8
index?  Dear God!  You better be using a 64 bit machine!"  or  "The only
way you will be able to handle a load like that will be with
UltraSparc+Solaris!!" or "Some Alpha fan did tons of assembly level
optimizations for the Alpha processors, they are the fastest!"  From the
sounds of it what I need is lots of fast disks and to hell with the
CPU.  I might as well get an Athlon and a Network Appliance.
 
> > * Since most of what I am doing is inserts I will assume that the disk
> > will be my bottleneck over anything else.  Is this correct?
> 
> Yes.
> 
> > * Will the 7.1 WAL save me, when it comes to insert times?
> > * I read something about disabling fsync() to speed up inserts.  How
> > much will this speed things up?  I would consider it as I COULD rebuild
> > lost data from my logs in the event of a system crash and one night a
> > year of panic on my part is worth saving $100,000 in drive arrays.
> 
> Turning fsync off gave about a factor of 10 increase (!) before 7.1 - the
> WAL stuff means you can do so safely. Basically PG should rebuild itself
> from the WAL file in the event of system failure. Have to admit I've not
> tried it myself yet.
A tenfold increase in insert speed over what I have now would make me a
very happy man.  That would put me in the order of magnitude of speed I
need with the hardware I have now.
> > Oh, and if any of you SQL guru's are still reading I'll show you the
> > queries I will be running to see if I can really ditch an index.
> 
> > select * from table where customer_id = ? and primary_key = ?::int8
> 
> Well if primary_key is one, you don't need customer_id.
true.
 
> > select * from table where customer_id = ? and group_number = ?
The groups are less than 10 objects each so ther customer index here
does no good.
> > select * from table where customer_id = ? and creation > ? and creation < ?
This is where the customer index comes in handy.  I'll explain below.
 
> How many customers do you have? Are the customers going to want a
> specific time period? If so, you could have one table per week (say) and
> use a view to conceal the fact for the occasional query that crosses week
> boundaries.
> 
> If these are actually going to by GROUP BY queries totalling figures,
> perhaps try calculating totals beforehand.
> 
> If these are representative of your main queries, I'd be tempted to buy
> 10 cheap machines (+ a spare) and split the customers among the machines.
> Proxy this stuff at the application level and they'll never know. For
> your management stats you'd have to write a script to summarise stuff
> from several machines, but that shouldn't be too difficult. The spare
> machine can be over-specced and have the data from all the other ready
> for a hot-swapover.
> 
> The beauty of multiple machines is it should scale well (so long as you
> are sufficiently fascist about keeping the configs the same).
I have about 100 customers but one of them is repsonsible for generating
over half the database entries.  Without the customer index the queries
of the smaller customers who have a dozen entries are slow becasue the
database is sorting through millions of entries from other customers.  I
suppose the thing to do is to get the big customer their own box and
drop the customer index all together.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Orion Henry | 2001-03-15 00:42:31 | Re: Fast Inserts and Hardware Questions | 
| Previous Message | Gordon A. Runkle | 2001-03-15 00:23:27 | Re: Scalability |