From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Rahila Syed <rahilasyed90(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Rahila Syed <rahilasyed(dot)90(at)gmail(dot)com> |
Subject: | Re: Parallel Index Scans |
Date: | 2017-02-04 01:44:46 |
Message-ID: | CAA4eK1JCj6rpXewPYMrCLXFGPCbfP7X7HgvN_n9h7MGuJFVoyg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Feb 4, 2017 at 5:54 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Feb 1, 2017 at 12:58 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> Yeah, I understand that point and I can see there is strong argument
>> to do that way, but let's wait and see what others including Robert
>> have to say about this point.
>
> It seems to me that you can make an argument for any point of view.
> In a parallel sequential scan, the smallest unit of work that can be
> given to one worker is one heap page; in a parallel index scan, it's
> one index page. By that logic, as Rahila says, we ought to do this
> based on the number of index pages. On the other hand, it's weird to
> use the same GUC to measure index pages at some times and heap pages
> at other times, and it could result in failing to engage parallelism
> where we really should do so, or using an excessively small number of
> workers. An index scan that hits 25 index pages could hit 1000 heap
> pages; if it's OK to use a parallel sequential scan for a table with
> 1000 heap pages, why is it not OK to use a parallel index scan to scan
> 1000 heap pages? I can't think of any reason.
>
I think one difference is that if we want to scan 1000 heap pages with
parallel index scan, scanning index cost is additional as compare to
parallel sequential scan.
> On balance, I'm somewhat inclined to think that we ought to base
> everything on heap pages, so that we're always measuring in the same
> units. That's what Dilip's patch for parallel bitmap heap scan does,
> and I think it's a reasonable choice. However, for parallel index
> scan, we might want to also cap the number of workers to, say,
> index_pages/10, just so we don't pick an index scan that's going to
> result in a very lopsided work distribution.
>
I guess in the above context you mean heap_pages or index_pages that
are expected to be *fetched* during index scan.
Yet another thought is that for parallel index scan we use
index_pages_fetched, but use either a different GUC
(min_parallel_index_rel_size) with a relatively lower default value
(say equal to min_parallel_relation_size/4 = 2MB) or directly use
min_parallel_relation_size/4 for parallel index scans.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2017-02-04 01:45:10 | Re: Parallel tuplesort (for parallel B-Tree index creation) |
Previous Message | Andres Freund | 2017-02-04 01:34:46 | Re: Time to up bgwriter_lru_maxpages? |