Re: including PID or backend ID in relpath of temp rels

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: including PID or backend ID in relpath of temp rels
Date: 2010-05-04 18:24:02
Message-ID: AANLkTik2j42D7Aob0EVXO1xi5fzQHpER_Kk7MpcvIfeF@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 4, 2010 at 2:06 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Robert Haas escribió:

Hey, thanks for writing back! I just spent the last few hours
thinking about this and beating my head against the wall.

>> [smgr.c,inval.c] Do we need to call CacheInvalidSmgr for temporary
>> relations?  I think the only backend that can have an smgr reference
>> to a temprel other than the owning backend is bgwriter, and AFAICS
>> bgwriter will only have such a reference if it's responding to a
>> request by the owning backend to unlink the associated files, in which
>> case (I think) the owning backend will have no reference.
>
> Hmm, wasn't there a proposal to have the owning backend delete the files
> instead of asking the bgwriter to?

I did propose that upthread; it may have been proposed previously
also. This might be worth doing independently of the rest of the patch
(which I'm starting to fear is doomed, cue ominous soundtrack) since
it would reduce the chance of orphaning data files and possibly
simplify the logic also.

>> [dbsize.c] As with relcache.c, there's a problem if we're asked for
>> the size of a temporary relation that is not our own: we can't call
>> relpath() without knowing the ID of the owning backend, and there's no
>> way to acquire that information for pg_class.  I guess we could just
>> refuse to answer the question in that case, but that doesn't seem real
>> cool.  Or we could physically scan the directory for files that match
>> a suitably constructed wildcard, I suppose.
>
> I don't very much like the wildcard idea; but I don't think it's
> unreasonable to refuse to provide a file size.  If the owning backend
> has still got part of the table in local buffers, you'll get a
> misleading answer, so perhaps it's best to not give an answer at all.
>
> Maybe this problem could be solved if we could somehow force that
> backend to write down its local buffers, in which case it'd be nice to
> have a solution to the dbsize problem.

I'm sure we could add some kind of signaling mechanism that would tell
all backends to flush their local buffers, but I'm not too sure it
would help this case very much, because you likely wouldn't want to
wait for all the backends to complete that process before reporting
results.

>> [syncscan.c] It seems we pursue this optimization even for temprels; I
>> can't think of why that would be useful in practice.  If it's useless
>> overhead, should we skip it?  This is really independent of this
>> project; just a side thought.
>
> Maybe recently used buffers are more likely to be in the OS page cache,
> so perhaps it's not good to disable it.

I don't get it. If the whole relation fits in the page cache, it
doesn't much matter where you start a seqscan. If it doesn't,
starting where the last one ended is anti-optimal.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Srinivas Naik 2010-05-04 18:24:35 Re: Reg: SQL Query for Postgres 8.4.3
Previous Message Alvaro Herrera 2010-05-04 18:06:19 Re: including PID or backend ID in relpath of temp rels