Quick Links

Re: [PERFORMANCE] expanding to SAN: which portion best to move

From:	Greg Smith <greg(at)2ndQuadrant(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [PERFORMANCE] expanding to SAN: which portion best to move
Date:	2011-05-24 18:48:52
Message-ID:	4DDBFD94.9010407@2ndQuadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-performance

On 05/17/2011 05:47 AM, Craig Ringer wrote:
> This makes me wonder if Pg attempts to pre-fetch blocks of interest
> for areas where I/O needs can be known in advance, while there's still
> other works or other I/O to do. For example, pre-fetching for the next
> iteration of a nested loop while still executing the prior one. Is it
> even possible?

Well, remember that a nested loop isn't directly doing any I/O. It's
pulling rows from some lower level query node. So the useful question
to ask is "how can pre-fetch speed up the table access methods?" That
worked out like this:

Sequential Scan: logic here was added and measured as useful for one
system with terrible I/O. Everywhere else it was tried on Linux, the
read-ahead logic in the kernel seems to make this redundant. Punted as
too much complexity relative to measured average gain. You can try to
tweak this on a per-file database in an application, but the kernel has
almost as much information to make that decision usefully as the
database does.

Index Scan: It's hard to know what you're going to need in advance here
and pipeline the reads, so this hasn't really been explored yet.

Bitmap heap scan: Here, the exact list of blocks to fetch is known in
advance, they're random, and it's quite possible for the kernel to
schedule them more efficiently than serial access of them can do. This
was added as the effective_io_concurrency feature (it's the only thing
that feature impacts), which so far is only proven to work on Linux.
Any OS implementing the POSIX API used will also get this however;
FreeBSD was the next likely candidate that might benefit when I last
looked around.

> I'm guessing not, because (AFAIK) Pg uses only synchronous blocking
> I/O, and with that there isn't really a way to pre-fetch w/o threads
> or helper processes. Linux (at least) supports buffered async I/O, so
> it'd be possible to submit such prefetch requests ... on modern Linux
> kernels. Portably doing so, though - not so much.

Linux supports the POSIX_FADV_WILLNEED advisory call, which is perfect
for suggesting what blocks will be accessed in the near future in the
bitmap heap scan case. That's how effective_io_concurrency works.

Both Solaris and Linux also have async I/O mechanisms that could be used
instead. Greg Stark built a prototype and there's an obvious speed-up
there to be had. But the APIs for this aren't very standard, and it's
really hard to rearchitect the PostgreSQL buffer manager to operate in a
less synchronous way. Hoping that more kernels support the "will need"
API usefully, which meshes very well with how PostgreSQL thinks about
the problem, is where things are at right now. With so many bigger
PostgreSQL sites on Linux, that's worked out well so far.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

In response to

Re: [PERFORMANCE] expanding to SAN: which portion best to move at 2011-05-17 09:47:09 from Craig Ringer

Responses

Re: [PERFORMANCE] expanding to SAN: which portion best to move at 2011-05-25 08:51:57 from Vitalii Tymchyshyn

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2011-05-24 19:53:19	Re: [GENERAL] Error compiling sepgsql in PG9.1
Previous Message	Kevin Traster	2011-05-24 18:48:36	Postgres Triggers instead of requiring a field - fire when field not included

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tomas Vondra	2011-05-24 19:20:52	Re: Performance degradation of inserts when database size grows
Previous Message	panam	2011-05-24 14:34:57	Re: Hash Anti Join performance degradation