From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Neil Conway <neilc(at)samurai(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: adding support for posix_fadvise() |
Date: | 2003-11-03 14:38:23 |
Message-ID: | 13972.1067870303@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Neil Conway <neilc(at)samurai(dot)com> writes:
> So what API is desirable for uses 2-4? I'm thinking of adding a new
> function to the smgr API, smgradvise().
It's a little premature to be inventing APIs when you have no evidence
that this will make any useful performance difference. I'd recommend a
quick hack to get proof of concept before you bother with nice APIs.
> Given a Relation and an advice, this would:
> (a) propagate the advice for this relation to all the open FDs for the
> relation
"All"? You cannot affect the FDs being used by other backends. It's
fairly unclear to me what the posix_fadvise function is really going
to do for files that are being accessed by multiple processes. For
instance, is there any value in setting POSIX_FADV_DONTNEED on a WAL
file, given that every other backend is going to have that same file
open? I would expect that rational kernel behavior would be to ignore
this advice unless it's set by the last backend to have the file open
--- but I'm not sure we can synchronize the closing of old WAL segments
well enough to know which backend is the last to close the file.
A related problem is that the smgr uses the same FD to access the same
relation no matter how many scans are in progress. Think about a
complex query that is doing both a seqscan and an indexscan on the same
relation (a self-join could easily do this). You'd really need to
change this if you want POSIX_FADV_SEQUENTIAL and POSIX_FADV_RANDOM to
get set usefully.
In short I think you need to do some more thinking about what the scope
of the advice flags is going to be ...
> (b) store the new advice somewhere so that new FDs for the relation can
> have this advice set for them: clients should just be able to call
> smgradvise() without needing to worry if someone else has already called
> smgropen() for the relation in the past. One problem is how to store
> this: I don't think it can be a field of RelationData, since that is
> transient. Any suggestions?
Something Vadim had wanted to do for years is to decouple the smgr and
lower levels from the existing Relation cache, and have a low-level
notion of "open relation" that only requires having the "RelFileNode"
value to open it. This would allow eliminating the concept of blind
write, which would be a Very Good Thing. It would make sense to
associate the advice setting with such low-level relations. One
possible way to handle the multiple-scan issue is to make the desired
advice part of the low-level open() call, so that you actually have
different low-level relations for seq and random access to a relation.
Not sure if this works cleanly when you take into account issues like
smgrunlink, but it's something to think about.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Sullivan | 2003-11-03 14:48:45 | Re: Experimental patch for inter-page delay in VACUUM |
Previous Message | Jan Wieck | 2003-11-03 14:35:57 | Re: Experimental patch for inter-page delay in VACUUM |