From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
Subject: | Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) |
Date: | 2018-01-17 17:27:10 |
Message-ID: | CAH2-Wzm-Zfz-1aH3vtz35xJmOr+ihcN0mQ+WQJXQ87BGtR7mPg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jan 17, 2018 at 5:47 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> I could still reproduce it. I think the way you have fixed it has a
> race condition. In _bt_parallel_scan_and_sort(), the value of
> brokenhotchain is set after you signal the leader that the worker is
> done (by incrementing workersFinished). Now, the leader is free to
> decide based on the current shared state which can give the wrong
> value. Similarly, I think the value of havedead and reltuples can
> also be wrong.
> You neither seem to have fixed nor responded to the second problem
> mentioned in my email upthread [1]. To reiterate, the problem is that
> we can't assume that the workers we have launched will always start
> and finish. It is possible that postmaster fails to start the worker
> due to fork failure. In such conditions, tuplesort_leader_wait will
> hang indefinitely because it will wait for the workersFinished count
> to become equal to launched workers (+1, if leader participates) which
> will never happen. Am I missing something due to which this won't be
> a problem?
I think that both problems (the live _bt_parallel_scan_and_sort() bug,
as well as the general issue with needing to account for parallel
worker fork() failure) are likely solvable by not using
tuplesort_leader_wait(), and instead calling
WaitForParallelWorkersToFinish(). Which you suggested already.
Separately, I will need to monitor that bugfix patch, and check its
progress, to make sure that what I add is comparable to what
ultimately gets committed for parallel query.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2018-01-17 17:30:16 | Re: [HACKERS] GnuTLS support |
Previous Message | Petr Jelinek | 2018-01-17 17:07:31 | Re: [PATCH] Logical decoding of TRUNCATE |