From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD |
Date: | 2017-12-12 11:50:22 |
Message-ID: | CAA4eK1+27zVUa__1V9ysftw48Yn+QEKSikkLumPjbuRacmHaQQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 12, 2017 at 4:00 PM, Kuntal Ghosh
<kuntalghosh(dot)2007(at)gmail(dot)com> wrote:
> On Mon, Dec 11, 2017 at 2:26 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> On Mon, Dec 11, 2017 at 8:14 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>>> Thanks for looking into it. I will see if we can write some test. In
>>> the meantime if possible, can you please request Patrick Hemmer to
>>> verify the attached patch?
>>
>> Our discussion was on the #postgresql Freenode channel. I pointed him
>> at this thread, but I'm not sure if he'll see my message or be able to
>> test.
> After discussing with Amit, I'm able to reproduce the scenario in a
> master-standby setup. The issue occurs when we perform parallel
> index(-only) scan on a BTP_HALF_DEAD -marked page. (If a page is
> marked as BTP_DELETED, it's already unlinked from the index).
>
> When a btree page is deleted during vacuum, it's first marked as
> BTP_HALF_DEAD in _bt_mark_page_halfdead and then marked as BTP_DELETED
> in _bt_unlink_halfdead_page without releasing cleanup lock on the
> buffer. Hence, any scan node cannot lock the same buffer. So, the
> issue can't be reproduced on master.
>
> However, after replaying XLOG_BTREE_MARK_PAGE_HALFDEAD record, standby
> releases the lock on the same buffer. If we force parallelism, an
> index scan on the same page will cause hang the standby server.
> Following is a (unpleasant)way to reproduce the issue:
>
> In master (with autovacuum = off):
> 1. create table t1(a int primary key);
> 2. insert into t1 select * from generate_series(1,1000); --generates 3
> leaf nodes (block no 1,2,4) and 1 root node (block no 3)
> 3. delete from t1 where a>=367 and a<=735; --delete all tuples pointed by leaf 2
> 4. analyze t1; --update the stats
> 5. explain analyze select * from t1 where a>=1 and a<=1000; --ensures
> that the next vacuum will consider leaf 2 for page deletion
What do you mean by next vacuum, here autovacuum is off? Are you
missing any step which manually performs the vacuum?
> Now, put a break point at _bt_unlink_halfdead_page, so that vacuum
> can't unlink the page.
>
> In standby,
> 1. force parallelism.
> 2. explain analyze select * from t1 where a>=1 and a<=1000; and the
> parallel workers hang at the above-discussed place!
>
> The attached patch fixes the issue. One comment on the same:
> + else if (scan->parallel_scan != NULL)
> + {
> + /* allow next page be processed by parallel worker */
> + _bt_parallel_release(scan, opaque->btpo_next);
> + }
>
> /* nope, keep going */
> if (scan->parallel_scan != NULL)
>
> IMHO, There is no need to add an extra if condition.
> _bt_parallel_release can be included in the next one.
>
I don't think so because it is quite possible that _bt_readpage has
already released it (consider the case where _bt_readpage returns
false).
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Nikhil Sontakke | 2017-12-12 12:04:31 | Re: [HACKERS] logical decoding of two-phase transactions |
Previous Message | Ildus Kurbangaliev | 2017-12-12 11:41:42 | Re: [HACKERS] Custom compression methods |