From: | Orion Henry <orion(at)trustcommerce(dot)com> |
---|---|
To: | dev(at)archonet(dot)com |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Fast Inserts and Hardware Questions |
Date: | 2001-03-15 00:34:09 |
Message-ID: | 3AB00E01.6C4F7333@trustcommerce.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Richard H wrote:
>
> On 3/14/01, 7:08:48 PM, Orion Henry <orion(at)trustcommerce(dot)com> wrote
> regarding [GENERAL] Fast Inserts and Hardware Questions:
>
> Please bear in mind that you are operating well above anything I do, so
> I'm not speaking from experience.
I appricate the advice in anycase.
> > I am specing out a database for my company and I need some advice. The
> > database in mind will consist of one table, with about 300 bytes per
> > record. The table will be getting two or three million inserts a week
> > and it would be nice if it could handle a sustained 30 to 50 a second.
> > The table will have (ACK!) 4 indexes (if anyone can explain to me how I
> > can get away with less please let me know)
>
> Three million inserts a week is about 5 per second, so you want to
> sustain 10 times the average rate (sounds reasonable). Have you
> considered separating inserts from reads? The inserts wouldn't even need
> to go into a database initially, just log them to a file and feed them in
> at a steady 10 per second (allows for downtime).
Already done. It's just that the less time between getting the data and
it being avaliable to the customers the better we look.
> > The indexes will be
> > int8 (primary key)
> > int4 (group number)
> > timestamp (creation date)
> > int4 (customer id)
>
> > The customers want to be able to query their data whenever and have it
> > be snappy.
>
> And they don't see why they need to pay so much for something so simple,
> either ;-)
Tell me about it ;)
> > So here is my question:
> > * Is there an OS that is best suited for postgres. All things being
> > equal I would like to run this on Linux.
>
> Can't comment - I've only used PG on Linux. People who use the various
> xxxBSDs swear by them (rather than at them) and Solaris has some
> excellent support (at a cost)
>
> Of course, you could try MS-Access on Win-ME ;-)
Or... not. ;)
> > * Is there an architecture best suited for this. Should I shell out the
> > extra cash for a 64 bit box over a 32bit one. Quad Xeon or Quad Alpha?
> > Quad Alpha or Quad UltraSparc?
>
> Again heresay, but Sun's hardware has excellent I/O throughput and it's
> difficult to get people to abandon it in favour of Intel once they've
> tried it.
What I was hoping to hear was something along the line of "An int8
index? Dear God! You better be using a 64 bit machine!" or "The only
way you will be able to handle a load like that will be with
UltraSparc+Solaris!!" or "Some Alpha fan did tons of assembly level
optimizations for the Alpha processors, they are the fastest!" From the
sounds of it what I need is lots of fast disks and to hell with the
CPU. I might as well get an Athlon and a Network Appliance.
> > * Since most of what I am doing is inserts I will assume that the disk
> > will be my bottleneck over anything else. Is this correct?
>
> Yes.
>
> > * Will the 7.1 WAL save me, when it comes to insert times?
> > * I read something about disabling fsync() to speed up inserts. How
> > much will this speed things up? I would consider it as I COULD rebuild
> > lost data from my logs in the event of a system crash and one night a
> > year of panic on my part is worth saving $100,000 in drive arrays.
>
> Turning fsync off gave about a factor of 10 increase (!) before 7.1 - the
> WAL stuff means you can do so safely. Basically PG should rebuild itself
> from the WAL file in the event of system failure. Have to admit I've not
> tried it myself yet.
A tenfold increase in insert speed over what I have now would make me a
very happy man. That would put me in the order of magnitude of speed I
need with the hardware I have now.
> > Oh, and if any of you SQL guru's are still reading I'll show you the
> > queries I will be running to see if I can really ditch an index.
>
> > select * from table where customer_id = ? and primary_key = ?::int8
>
> Well if primary_key is one, you don't need customer_id.
true.
> > select * from table where customer_id = ? and group_number = ?
The groups are less than 10 objects each so ther customer index here
does no good.
> > select * from table where customer_id = ? and creation > ? and creation < ?
This is where the customer index comes in handy. I'll explain below.
> How many customers do you have? Are the customers going to want a
> specific time period? If so, you could have one table per week (say) and
> use a view to conceal the fact for the occasional query that crosses week
> boundaries.
>
> If these are actually going to by GROUP BY queries totalling figures,
> perhaps try calculating totals beforehand.
>
> If these are representative of your main queries, I'd be tempted to buy
> 10 cheap machines (+ a spare) and split the customers among the machines.
> Proxy this stuff at the application level and they'll never know. For
> your management stats you'd have to write a script to summarise stuff
> from several machines, but that shouldn't be too difficult. The spare
> machine can be over-specced and have the data from all the other ready
> for a hot-swapover.
>
> The beauty of multiple machines is it should scale well (so long as you
> are sufficiently fascist about keeping the configs the same).
I have about 100 customers but one of them is repsonsible for generating
over half the database entries. Without the customer index the queries
of the smaller customers who have a dozen entries are slow becasue the
database is sorting through millions of entries from other customers. I
suppose the thing to do is to get the big customer their own box and
drop the customer index all together.
From | Date | Subject | |
---|---|---|---|
Next Message | Orion Henry | 2001-03-15 00:42:31 | Re: Fast Inserts and Hardware Questions |
Previous Message | Gordon A. Runkle | 2001-03-15 00:23:27 | Re: Scalability |