Quick Links

Re: Optimizer not using index on 120M row table

From:	"scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To:	"Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc:	<pgsql-general(at)postgresql(dot)org>
Subject:	Re: Optimizer not using index on 120M row table
Date:	2003-04-30 14:52:08
Message-ID:	Pine.LNX.4.33.0304300819280.18259-100000@css120.ihs.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Tue, 29 Apr 2003, Jim C. Nasby wrote:

> On Tue, Apr 29, 2003 at 09:46:20AM -0600, scott.marlowe wrote:
> > There are a few settings that you pretty much have to change on bigger
> > hardware to get good performance.
> >
> > shared_buffers (1000 to 10000 is a good place to start.) measured in 8k
> > blocks.
> > effective_cache_size (size of OS file system and disk caches measured in
> > 8k blocks)
>
> Should effective_cache_size include the size of shared_buffers? Also,
> FreeBSD doesn't seem to want to use more than about 300M for disk
> caching, so I currently have shared_buffers set to 90000 or about 700M
> (the box has 2G, but pgsql currently has to share with Sybase). Are
> there any issues with setting shared_buffers so high? Anyone know how to
> make FBSD do more caching on it's own, or should I just let pgsql handle
> it?

No, it doesn't. From reading the docs it would appear that it's the
estimate of how much memory the kernel is using to cache disk access for
postgresql alone. I.e. if you're running an LDAP server that uses 32
Megs, an apache server using another 32 Megs or so, and postgresql set for
shared buffers of 32 Megs, then on a 256 Meg machine that shows 128 Megs
used for cache, it's likely that only portion is postgresql cache, around
96 Megs if all three apps do the same amount of disk access.

There may be some tuning parameters for BSD that will let it use more disk
cache, or it could there just isn't any more to cache.

There's no problem with cranking up shared_buffers, but you should always
test it and compare it to a few smaller settings and find the "knee" where
Postgresql stops getting faster. On my box that was around 128 Meg of
ram, with a slight gain to 256 Meg, so, since I had 900 Megs as disk cache
at the time I set it to 256 Meg. For my machine and my load, that's lots
and lots of shared_buffer cache. There are questions about performance
loss as the shared_buffer setting goes up due to the nature of
postgresql's buffering method, which is based on shared memory. It would
appear that most kernels are great at buffering huge amounts of data in a
dirt simple way, while the shared memory that postgresql uses may start
off faster (say at a few thousand buffers) but seems to reach a point of
diminishing returns in real tests. For me that was somewhere around 512
Meg of shared buffer memory. Note that it's not like Postgresql ground to
a halt and stopped working, it just stopped going faster, and started
getting a little slower.

It might still be a win for certain types of loads, just not mine. The
load my database runs is about 90% small lookups (select * from view where
id=123) and 5% medium reports (select * from view where date>1 month ago)
and about 5% gargantuan reports (select * from sales_history where year
like '199%'). The problem for me is that only the small lookup type apps
are changing a lot of data, and refetching all the time. The medium
reports may get run a dozen times in one sitting, but most of the time
it's a one shot. The gargantuan reports are run by batch files mostly, or
by single users who know it's ok that it takes a minute or two to get
their report and they won't hit reload six or seven times. :-)

The thing that slowed down the most at high buffer for me was the small
access applications, i.e. work flow, order processing type stuff. It
didn't benefit from a larger cache because it's working over a small
dataset, so any performance loss due to overhead was seen here first,
since there was no corresponding advantage to a large cache on a small
dataset.

The big reports were actually almost the same speed, sometimes still
faster at 512Meg than 256Meg, but that makes sense, since select * from
logs order by random() is gonna take up some space :-) especially if
you're joining it to get a username -> realname lookup. But all those big
things are faster if you can run them on a cron and have them ready in the
morning. And the machine doesn't care if it's 20% faster at 2:00am.

But shared_buffers isn't the dangerous setting, that's sort_mem
cranked up too high. Since sort_mem comes from plain old memory, not the
shared_buffers, a bunch of order by statements running in parallel can
make the kernel flush it's cache, then start swapping out programs if
things get real bad. Causing the kernel to flush cache means that you can
get into a round robin cache problem where everything the database asks
for was just here a minute ago, but we just flushed it to make room for
this last block here. And that happens for each block you're looking for.

Bad as that is, it's nothing compared to making your machine start
swapping, especially if what it's swapping are shared buffer blocks...
Which many OSes (Linux is one not sure about the BSDs) will happily do.

After that the danger in Linux is that you'll use up all the swap, all the
mem, and the kernel will start kill -9ing "rogue" processes.
Unfortunately, the "rogue process recognition algorhythm" in Linux is
kill that which is big and hungry. I.e. Postgresql backends.

So, I think the process of setting those settings for a machine is by it's
nature an interactive thing, you have to build it and then put it under
load and measure it's performance and tweak it a little at a time. The
real danger is in overshooting the optimum. The first sign is that things
aren't getting faster, and the next sign might be that the machine is
crawling in a swap storm. Mediocre but reliable is preferable to meteoric
(both in terms of speed and reliability.)

What I shoot for now is to have half of the memory in a box be kernel
cache, 1/4th or so be postgresql programs and cache, and the other 1/4th
to run anything else on the box. On my current machine that's 1.5 Gig
mem, 700-800M used for kernel disk cache, 256 Meg used for postgresql, and
about 400-500M used for everything else. Postgresql performance is really
not an issue unless we write a poorly designed query. And if we do, we
can't usually use up enough memory to cause problems.

In response to

Re: Optimizer not using index on 120M row table at 2003-04-29 23:19:33 from Jim C. Nasby

Browse pgsql-general by date

	From	Date	Subject
Next Message	Robert Fitzpatrick	2003-04-30 15:06:01	Function for field arrays
Previous Message	Tom Lane	2003-04-30 14:30:14	Re: pg_dumpall -g problem