Re: Cache relation sizes?

From: "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>
To: "Jamison, Kirk" <k(dot)jamison(at)jp(dot)fujitsu(dot)com>
Cc: 'Kyotaro HORIGUCHI' <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, "thomas(dot)munro(at)enterprisedb(dot)com" <thomas(dot)munro(at)enterprisedb(dot)com>, "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>, "david(dot)rowley(at)2ndquadrant(dot)com" <david(dot)rowley(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cache relation sizes?
Date: 2019-02-06 08:56:52
Message-ID: 20190206085652.yfmh6d5nm475jkt2@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2019-02-06 08:50:45 +0000, Jamison, Kirk wrote:
> On February 6, 2019, 08:25AM +0000, Kyotaro HORIGUCHI wrote:
>
> >At Wed, 6 Feb 2019 06:29:15 +0000, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> wrote:
> >> Although I haven't looked deeply at Thomas's patch yet, there's currently no place to store the size per relation in shared memory. You have to wait for the global metacache that Ideriha-san is addressing. Then, you can store the relation size in the RelationData structure in relcache.
>
> >Just one counter in the patch *seems* to give significant gain comparing to the complexity, given that lseek is so complex or it brings latency, especially on workloads where file is scarcely changed. Though I didn't run it on a test bench.
>
> > > > (2) Is the MdSharedData temporary or permanent in shared memory?
> > > Permanent in shared memory.
> > I'm not sure the duration of the 'permanent' there, but it disappears when server stops. Anyway it doesn't need to be permanent beyond a server restart.
>
>
> Thank you for the insights.
> I did a simple test in the previous email using simple syscall tracing,
> the patch significantly reduced the number of lseek syscall.
> (but that simple test might not be enough to describe the performance benefit)
>
> Regarding Tsunakawa-san's comment,
> in Thomas' patch, he made a place in shared memory that stores the
> relsize_change_counter, so I am thinking of utilizing the same,
> but for caching the relsize itself.
>
> Perhaps I should explain further the intention for the design.
>
> First step, to cache the file size in the shared memory. Similar to the
> intention or purpose of the patch written by Thomas, to reduce the
> number of lseek(SEEK_END) by caching the relation size without using
> additional locks. The difference is by caching a rel size on the shared
> memory itself. I wonder if there are problems that you can see with
> this approach.
>
> Eventually, the next step is to have a structure in shared memory
> that caches file addresses along with their sizes (Andres' idea of
> putting an open relations table in the shared memory). With a
> structure that group buffers into file relation units, we can get
> file size directly from shared memory, so the assumption is it would
> be faster whenever we truncate/extend our relations because we can
> track the offset of the changes in size and use range for managing
> the truncation, etc..
> The next step is a complex direction that needs serious discussion,
> but I am wondering if we can proceed with the first step for now if
> the idea and direction are valid.

Maybe I'm missing something here, but why is it actually necessary to
have the sizes in shared memory, if we're just talking about caching
sizes? It's pretty darn cheap to determine the filesize of a file that
has been recently stat()/lseek()/ed, and introducing per-file shared
data adds *substantial* complexity, because the amount of shared memory
needs to be pre-determined. The reason I want to put per-relation data
into shared memory is different, it's about putting the buffer mapping
into shared memory, and that, as a prerequisite, also need per-relation
data. And there's a limit of the number of relation sthat need to be
open (one per cached page at max), and space can be freed by evicting
pages.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2019-02-06 09:00:24 Re: Early WIP/PoC for inlining CTEs
Previous Message Jamison, Kirk 2019-02-06 08:50:45 RE: Cache relation sizes?