From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: automatic analyze: readahead - add "IO read time" log message |
Date: | 2021-03-12 00:29:46 |
Message-ID: | c8d21a1b-63bb-1bb5-c6e8-b69f5d2fa7e5@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 3/12/21 1:11 AM, Stephen Frost wrote:
> Greetings,
>
> * Tomas Vondra (tomas(dot)vondra(at)enterprisedb(dot)com) wrote:
>> On 3/8/21 8:42 PM, Stephen Frost wrote:
>>> * Tomas Vondra (tomas(dot)vondra(at)enterprisedb(dot)com) wrote:
>>>> On 2/10/21 11:10 PM, Stephen Frost wrote:
>>>>> * Heikki Linnakangas (hlinnaka(at)iki(dot)fi) wrote:
>>>>>> On 05/02/2021 23:22, Stephen Frost wrote:
>>>>>>> Unless there's anything else on this, I'll commit these sometime next
>>>>>>> week.
>>>>>>
>>>>>> One more thing: Instead of using 'effective_io_concurrency' GUC directly,
>>>>>> should call get_tablespace_maintenance_io_concurrency().
>>>>>
>>>>> Ah, yeah, of course.
>>>>>
>>>>> Updated patch attached.
>>>>
>>>> A couple minor comments:
>>>>
>>>> 1) I think the patch should be split into two parts, one adding the
>>>> track_io_timing, one adding the prefetching.
>>>
>>> This was already done..
>>
>> Not sure what you mean by "done"? I see the patch still does both
>> changes related to track_io_timing and prefetching.
>
> They are two patches..
>
> ➜ ~ grep Subject analyze_prefetch_v8.patch
> Subject: [PATCH 1/2] Improve logging of auto-vacuum and auto-analyze
> Subject: [PATCH 2/2] Use pre-fetching for ANALYZE
>
> The first doesn't have any prefetch-related things, the second doesn't
> have any track_io_timing things..
>
Oh! I didn't realize a single file can contain multiple separate
patches, so I saw a single file and assumed it's one patch. How do you
produce a single file, and is that better than just generating multiple
files using git format-patch?
>>>> 3) Is there a way to reduce the amount of #ifdef in acquire_sample_rows?
>>>
>>> Perhaps..
>>>
>>>> This makes the code rather hard to read, IMHO. It seems to me we can
>>>> move the code around a bit and merge some of the #ifdef blocks - see the
>>>> attached patch. Most of this is fairly trivial, with the exception of
>>>> moving PrefetchBuffer before table_scan_analyze_next_block - AFAIK this
>>>> does not materially change the behavior, but perhaps I'm wrong.
>>>
>>> but I don't particularly like doing the prefetch right before we run
>>> vacuum_delay_point() and potentially sleep.
>>
>> Why? Is that just a matter of personal preference (fair enough) or is
>> there a reason why that would be wrong/harmful?
>
> Telling the kernel "hey, we're about to need this, please go get it" and
> then immediately going to sleep just strikes me as a bit off. We should
> be trying to minimize the time between prefetch and actual request for
> the page. Of course, there's going to be some times that we will issue
> a prefetch and then come around again and end up hitting the limit and
> sleeping before we actually request the page and maybe it doesn't
> ultimately matter much but it just seems better to sleep first before
> issuing the prefetch to minimize the amount 'outstanding' when we do end
> up sleeping. One thing about prefetching is that the kernel is
> certainly within its rights to decide to drop the page before we
> actually go to read it, if it's under pressure and we're just sleeping.
>
I don't know, but this argument seems rather hand-wavy to me, TBH.
Firstly, we're prefetching quite far ahead anyway, so there's almost
always going to be a vacuum_delay_point() call between you issue the
prefetch and actually using the page. Secondly, vacuum_delay_point won't
really sleep for most of the calls (assuming the cost limit is not
unreasonably small).
The way I see it (a) it's unlikely a particular vacuum_delay_point()
call will sleep, while at the same time (b) it's quite probable one of
the calls between prefetching and using the page will sleep.
So I think the exact ordering of those two calls does not really matter,
at least not for this particular reason.
That being said, I don't feel strongly about this - if you prefer it
like this, so be it. I can live with two ifdefs instead of one.
>> I think e.g. prefetch_targblock could be moved to the next #ifdef, which
>> will eliminate the one-line ifdef.
>
> Sure, done in the attached.
>
> Thanks for the review! Unless there's other comments, I'll plan to push
> this over the weekend or early next week.
>
+1
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2021-03-12 00:31:43 | Re: Hybrid Hash/Nested Loop joins and caching results from subplans |
Previous Message | Kyotaro Horiguchi | 2021-03-12 00:23:12 | Re: shared-memory based stats collector |