From: | Amin <amin(dot)fallahi(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: How to find the number of cached pages for a relation? |
Date: | 2023-01-28 00:11:26 |
Message-ID: | CAF-KA889ceqFa-t=7NF+4KBZmBax_gxBV=TRu7kLV7+sV5egSQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thank you Andres.
If I want to do "a" ( Do one probe of the buffer mapping table for each
block of the relation. Thus O(#relation blocks)) what function calls can I
use, assuming I only have access to the relation id? How can I access and
scan the buffer mapping table?
On Fri, Jan 13, 2023 at 6:27 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2023-01-13 17:28:31 -0800, Amin wrote:
> > Before scanning a relation, in the planner stage, I want to make a call
> to
> > retrieve information about how many pages will be a hit for a specific
> > relation. The module pg_buffercache seems to be doing a similar thing.
> > Also, pg_statio_all_tables seems to be having that information, but it is
> > updated after execution. However, I want the information before
> execution.
> > Also not sure how pg_statio_all_tables is created and how I can access it
> > in the code.
>
> There's no cheap way to do that. Currently the only ways are to:
>
> a) Do one probe of the buffer mapping table for each block of the
> relation. Thus O(#relation blocks).
>
> b) Scan all of buffer headers, check which are for the relation. Thus
> O(#NBuffers)
>
> Neither of which are a good idea during planning.
>
>
> It might be a bit more realistic to get very rough estimates:
>
> You could compute the table's historic cache hit ratio from pgstats (i.e.
> use
> the data backing pg_statio_all_tables). Of course that's not going to be
> specific to your query (for index scans etc), and might have changed more
> recently. It'd also be completely wrong after a restart.
>
> If we had information about *recent* cache hit patterns for the relation,
> it'd
> be a lot better, but we don't have the infrastructure for that, and
> introducing it would increase the size of the stats entries noticably.
>
> Another way could be to probe the buffer mapping table for a small subset
> of
> the locks and infer the likelihood of other blocks being in shared buffers
> that way.
>
> A third way could be to track the cache hit for relations in backend local
> memory, likely in the relache entry. The big disadvantage would be that
> query
> plans would differ between connections and that connections would need to
> "warm up" to have good plans. But it'd handle restarts nicely.
>
> Greetings,
>
> Andres Freund
>
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2023-01-28 00:23:19 | Re: recovery modules |
Previous Message | Andres Freund | 2023-01-28 00:09:39 | Re: recovery modules |