From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
Subject: | Re: Parallel tuplesort (for parallel B-Tree index creation) |
Date: | 2017-01-31 05:15:45 |
Message-ID: | CAH2-Wzn=+MDmhOVHQ=M3TYxbCtPQsWStt0GT273XhADkiNoOvg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jan 30, 2017 at 8:46 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Wed, Jan 4, 2017 at 12:53 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> Attached is V7 of the patch.
>
> I am doing some testing. First, some superficial things from first pass:
>
> [Various minor cosmetic issues]
Oops.
> Just an observation: if you ask for a large number of workers, but
> only one can be launched, it will be constrained to a small fraction
> of maintenance_work_mem, but use only one worker. That's probably OK,
> and I don't see how to do anything about it unless you are prepared to
> make workers wait for an initial message from the leader to inform
> them how many were launched.
Actually, the leader-owned worker Tuplesort state will have the
appropriate amount, so you'd still need to have 2 participants (1
worker + leader-as-worker). And, sorting is much less sensitive to
having a bit less memory than hashing (at least when there isn't
dozens of runs to merge in the end, or multiple passes). So, I agree
that this isn't worth worrying about for a DDL statement.
> Should this 64KB minimum be mentioned in the documentation?
You mean user-visible documentation, and not just tuplesort.h? I don't
think that that's necessary. That's a ludicrously low amount of memory
for a worker to be limited to anyway. It will never come up with
remotely sensible use of the feature.
> + if (!btspool->isunique)
> + {
> + shm_toc_estimate_keys(&pcxt->estimator, 2);
> + }
>
> Project style: people always tell me to drop the curlies in cases like
> that. There are a few more examples in the patch.
I only do this when there is an "else" that must have curly braces,
too. There are plenty of examples of this from existing code, so I
think it's fine.
> + /* Wait for workers */
> + ConditionVariableSleep(&shared->workersFinishedCv,
> + WAIT_EVENT_PARALLEL_FINISH);
>
> I don't think we should reuse WAIT_EVENT_PARALLEL_FINISH in
> tuplesort_leader_wait and worker_wait. That belongs to
> WaitForParallelWorkersToFinish, so someone who see that in
> pg_stat_activity won't know which it is.
Noted.
> IIUC worker_wait() is only being used to keep the worker around so its
> files aren't deleted. Once buffile cleanup is changed to be
> ref-counted (in an on_dsm_detach hook?) then workers might as well
> exit sooner, freeing up a worker slot... do I have that right?
Yes. Or at least I think it's very likely that that will end up happening.
> Incidentally, barrier.c could probably be used for this
> synchronisation instead of these functions. I think
> _bt_begin_parallel would call BarrierInit(&shared->barrier,
> scantuplesortstates) and then after LaunchParallelWorkers() it'd call
> a new interface BarrierDetachN(&shared->barrier, scantuplesortstates -
> pcxt->nworkers_launched) to forget about workers that failed to
> launch. Then you could use BarrierWait where the leader waits for the
> workers to finish, and BarrierDetach where the workers are finished
> and want to exit.
I thought about doing that, actually, but I don't like creating
dependencies on some other uncommited patch, which is a moving target
(barrier stuff isn't committed yet). It makes life difficult for
reviewers. I put off adopting condition variables until they were
committed for the same reason -- it's was easy to do without them for
a time. I'll probably get around to it before too long, but feel no
urgency about it. Barriers will only allow me to make a modest net
removal of code, AFAIK.
Thanks
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2017-01-31 05:19:49 | Refactoring of replication commands using printsimple |
Previous Message | Vitaly Burovoy | 2017-01-31 05:13:15 | Re: macaddr 64 bit (EUI-64) datatype support |