Re: optimizing daily data storage in Pg

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: P Kishor <punk(dot)kish(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: optimizing daily data storage in Pg
Date: 2010-07-23 06:33:46
Message-ID: AANLkTinq0c2f3Fgam3hsFNfbY39oKiMzFtK5CwTE_5tO@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Does ordering the data in the table by day, cell_id help at all?
How big ARE the indexes we're talking about? If the total size of all
the indexes you need to do your queries run into the hundreds of
gigabytes, going from 12 to 32 Gigs of RAM may be like flapping your
arms our the window of your car in terms of effectiveness. If the
indexes that you need at one time add up to something in the 32 to
256Gig range then adding enough memory to hold most or all of them at
once would help and is doable, but the price goes up fast when you get
to the > 64Gig range.

If everything is read only, then pg may or may not be the best fit.
It sounds more like what you're doing is batch processing, or at least
batch processing friendly. If you could roll up your data, either in
the db or beforehand while preparing it, that might be a big win.

Is your machine IO bound when running these queries?

What does "iostat -xd 1" say about throughput and % utilization? I'm
assuming you're running an OS with sysstat available. If you're on
something else, then you'll need to research how to see your IO
workload on that OS.

htop is a nice graphical way of seeing your wait states as well, with
the red bars representing IO wait on a machine.

If your machine is IO bound, and you've gotten enough ram to hold the
working set of your indexes, then you'll need more hard drives under
good controllers to make it faster. The bad news is that RAID
controllers and lots of hard drives can be expensive, the good news is
that reporting servers (which is sounds like what this is) use a lot
of sequential access, and throwing more drives at the problem gives
big gains, usually.

As far as partitioning goes, I think you either need to use fewer
partitions, or just use individual tables without using the parent
table to access them. It's a well known problem with partitioning
that past a few hundred or so child tables things get slow pretty
fast. Having 200 to 500 tables, maybe even 1,000 is workable, but past
that no, not really.

If you're IO bound, then you'll likely need more CPU horsepower.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Howard Rogers 2010-07-23 06:38:49 Re: Bitmask trickiness
Previous Message P Kishor 2010-07-23 05:39:16 Re: optimizing daily data storage in Pg