Quick Links

Re: "large" spam tables and performance: postgres memory parameters

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Gary Warner <gar(at)cis(dot)uab(dot)edu>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: "large" spam tables and performance: postgres memory parameters
Date:	2010-01-08 02:02:58
Message-ID:	20100108020258.GG17756@tamriel.snowman.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

* Gary Warner (gar(at)cis(dot)uab(dot)edu) wrote:
> - if you have 8 pentium cores, 12GB of RAM and "infinite" diskspace, what sorts of memory settings would you have in your start up tables?

If the PG database is the only thing on the system, I'd probably go with
something like:

shared_buffers = 4GB
temp_buffers = 1GB
work_mem = 128M # Maybe adjust this during a session if you have
# big/complex queries
maintenance_work_mem = 256M
checkpoint_segments = 20 # Maybe more..
effective_cache_size = 8GB # Maybe more if you have a SAN which is doing
# cacheing for you too..

> My biggest question mark there really has to do with how many users I have and how that might alter the results.

Presuming what you mean by this is "how would the number of users change
the settings I'm suggesting above", I'd say "probably not much for the
number of users you're talking about.". Really, 12-25 users just isn't
all that many. You probably need a queueing system to handle requests
that are going to take a long time to complete (as in, don't expect the
user or their web browser to stick around while you run a query that
takes half an hour to complete...).

> (I'll let this thread run a bit, and then come back to ask questions about "vacuum analyze" and "partitioned tables" as a second and third round of questions.)

You should definitely be lookig to do partitioning based on the type of
data and the way you want to restrict the queries. I'm not convinced
you'd actually *need* to restrict the queries to recent things if you
partition correctly- you might restrict the total *range* to be
something small enough that it won't take too long.

It sounds like you have a number of systems, in which case you might
consider sharding if you get a large number of users (where large is a
whole lot bigger than 25...) or you find that users really do need
real-time results on very large ranges. It involves a fair bit of code
to do and do well though, so you really have to consider it carefully
and make sure it will help your important use cases (and not too badly
impact your other use cases) before going that route.

autovacuum is your friend.. though you might need to tune it.

Thanks,

Stephen

In response to

"large" spam tables and performance: postgres memory parameters at 2010-01-07 15:23:17 from Gary Warner

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Craig Ringer	2010-01-08 03:31:09	Re: Air-traffic benchmark
Previous Message	Scott Marlowe	2010-01-08 01:59:39	Re: "large" spam tables and performance: postgres memory parameters