From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Rahila Syed <rahilasyed(dot)90(at)gmail(dot)com> |
Subject: | Re: Parallel Index Scans |
Date: | 2017-02-14 02:04:41 |
Message-ID: | CAA4eK1JfbBy9ftW+Ct5dCtFrAWVAMkHhEnMfUp-+Z1nPKFSgpw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Feb 13, 2017 at 5:47 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Feb 11, 2017 at 6:35 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>>>> Why can't we rely on _bt_walk_left?
>>>> The reason is mentioned in comments, but let me try to explain with
>>>> some example. When you reach that point of code, it means that either
>>>> the current page (assume page number is 10) doesn't contain any
>>>> matching items or it is a half-dead page, both of which indicates that
>>>> we have to move to the previous page. Now, before checking if the
>>>> current page contains matching items, we signal parallel machinery
>>>> (via _bt_parallel_release) to allow workers to read the previous page
>>>> (assume previous page number is 9). So it is quite possible that
>>>> after deciding that current page (page number 10) doesn't contain any
>>>> matching tuples if we directly move to the previous page (in this case
>>>> it will be 9) by using _bt_walk_left, some other worker would have
>>>> read page 9. In short, if we directly use _bt_walk_left(), then we
>>>> are prone to returning some of the values twice as multiple workers
>>>> can read the same page.
>>> But ... the entire point of the seize-and-release stuff is to avoid
>>> this problem. You're suppose to seize the scan, read the current
>>> page, walk left, store the page you find in the scan, and then release
>>> the scan.
>> Exactly and that is what is done in the patch. Basically, if we found
>> that the current page is half-dead or it doesn't contain any matching
>> items, then release the current buffer, seize the scan, read the
>> current page, walk left and so on. I am slightly confused here
>> because it seems both of us agree on what is the right thing to do and
>> according to me that is how it is implemented. Are you just ensuring
>> about whether I have implemented as discussed or do you see a problem
>> with the way it is implemented?
>
> Well, before, I thought you said that relying entirely on
> _bt_walk_left couldn't work because then two people might end up
> running it at the same time, and that would cause problems. But if
> you can only run _bt_walk_left while you've got the scan seized, then
> that can't happen. Evidently I'm missing something here.
>
I think the comment at that place is not as clear as it should be. So
how about changing it as below:
Existing comment:
--------------------------
/*
* For parallel scans, get the last page scanned as it is quite
* possible that by the time we try to fetch previous page, other
* worker has also decided to scan that previous page. So we
* can't rely on _bt_walk_left call.
*/
Modified comment:
--------------------------
/*
* For parallel scans, it is quite possible that by the time we try to fetch
* the previous page, another worker has also decided to scan that
* previous page. So to avoid that we need to get the last page scanned
* from shared scan descriptor before calling _bt_walk_left.
*/
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2017-02-14 02:19:26 | Re: renaming pg_resetxlog to pg_resetwal has broken pg_upgrade. |
Previous Message | Haribabu Kommi | 2017-02-14 01:59:43 | Re: [WIP]Vertical Clustered Index (columnar store extension) |