Re: Patch: add timing of buffer I/O requests

From: "Tomas Vondra" <tv(at)fuzzy(dot)cz>
To: "Greg Stark" <stark(at)mit(dot)edu>
Cc: "Greg Smith" <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Patch: add timing of buffer I/O requests
Date: 2011-11-28 15:29:46
Message-ID: 466f1deac38f415fd47e51756acf322d.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28 Listopad 2011, 15:40, Greg Stark wrote:
> On Nov 28, 2011 8:55 AM, "Greg Smith" <greg(at)2ndquadrant(dot)com> wrote:
>>
>> On 11/27/2011 04:39 PM, Ants Aasma wrote:
>>>
>>> On the AMD I saw about 3% performance drop with timing enabled. On the
>>> Intel machine I couldn't measure any statistically significant change.
>>
>>
>> Oh no, it's party pooper time again. Sorry I have to be the one to do
>> it
> this round. The real problem with this whole area is that we know there
> are systems floating around where the amount of time taken to grab
> timestamps like this is just terrible.
>
> I believe on most systems on modern linux kernels gettimeofday an its ilk
> will be a vsyscall and nearly as fast as a regular function call.

AFAIK a vsyscall should be faster than a regular syscall. It does not need
to switch to kernel space at all, it "just" reads the data from a shared
page. The problem is that this is Linux-specific - for example FreeBSD
does not have vsyscall at all (it's actually one of the Linux-isms
mentioned here: http://wiki.freebsd.org/AvoidingLinuxisms)

There's also something called VDSO, that (among other things) uses
vsyscall if availabe, or the best implementation available. So there are
platforms that do not provide vsyscall, and in that case it'd be just as
slow as a regular syscall :(

I wouldn't expect a patch that works fine on Linux but not on other
platforms to be accepted, unless there's a compile-time configure switch
(--with-timings) that'd allow to disable that.

Another option would be to reimplement the vsyscall, even on platforms
that don't provide it. The principle is actually quite simple - allocate a
shared memory, store there a current time and update it whenever a clock
interrupt happens. This is basically what Greg suggested in one of the
previous posts, where "regularly" means "on every interrupt". Greg was
worried about the precision, but this should be just fine I guess. It's
the precision you get on Linux, anyway ...

>> I recall a patch similar to this one was submitted by Greg Stark some
> time ago. It used the info for different reasons--to try and figure out
> whether reads were cached or not--but I believe it withered rather than
> being implemented mainly because it ran into the same fundamental
> roadblocks here. My memory could be wrong here, there were also concerns
> about what the data would be used for.

The difficulty when distinguishing whether the reads were cached or not is
the price we pay for using filesystem cache instead of managing our own.
Not sure if this can be solved just by measuring the latency - with
spinners it's quite easy, the differences are rather huge (and it's not
difficult to derive that even from pgbench log). But with SSDs, multiple
tablespaces on different storage, etc. it gets much harder.

Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-11-28 15:55:02 Re: Patch: add timing of buffer I/O requests
Previous Message Mikko Tiihonen 2011-11-28 15:18:53 Add minor version to v3 protocol to allow changes without breaking backwards compatibility