From: | Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
Subject: | Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) |
Date: | 2017-11-14 09:41:39 |
Message-ID: | CAGPqQf17vxDXFi=u9yf=aDF_0LJ+NGmRKJ0fUJYVUtzseQm6Zw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks Peter and Thomas for the review comments.
On Wed, Nov 1, 2017 at 3:59 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Thu, Oct 26, 2017 at 4:22 AM, Rushabh Lathia
> <rushabh(dot)lathia(at)gmail(dot)com> wrote:
> > Attaching the re based patch according to the v22 parallel-hash patch
> sets
>
> I took a quick look at this today, and noticed a few issues:
>
> * make_name() is used to name files in sharedtuplestore.c, which is
> what is passed to BufFileOpenShared() for parallel hash join. Your
> using your own logic for that within the equivalent logtape.c call to
> BufFileOpenShared(), presumably because make_name() wants to identify
> participants by PID rather than by an ordinal identifier number.
>
> I think that we need some kind of central registry for things that use
> shared buffiles. It could be that sharedtuplestore.c is further
> generalized to support this, or it could be that they both call
> something else that takes care of naming. It's not okay to have this
> left to random chance.
>
> You're going to have to ask Thomas about this. You should also use
> MAXPGPATH for the char buffer on the stack.
>
>
Used MAXPGPATH for the char buffer.
> * This logtape.c comment needs to be updated, as it's no longer true:
>
>
* successfully. In general, workers can take it that the leader will
> * reclaim space in files under their ownership, and so should not
> * reread from tape.
>
>
Done.
> * Robert hated the comment changes in the header of nbtsort.c. You
> might want to change it back, because he is likely to be the one that
> commits this.
>
>
* You should look for similar comments in tuplesort.c (IIRC a couple
> of places will need to be revised).
>
>
Pending.
> * tuplesort_begin_common() should actively reject a randomAccess
> parallel case using elog(ERROR).
>
>
Done.
> * tuplesort.h should note that randomAccess isn't supported, too.
>
>
Done.
> * What's this all about?:
>
> + /* Accessor for the SharedBufFileSet that is at the end of Sharedsort. */
> + #define GetSharedBufFileSet(shared) \
> + ((BufFileSet *) (&(shared)->tapes[(shared)->nTapes]))
>
> You can't just cast from one type to the other without regard for the
> underling size of the shared memory buffer, which is what this looks
> like to me. This only fails to crash because you're only abusing the
> last member in the tapes array for this purpose, and there happens to
> be enough shared memory slop that you get away with it. I'm pretty
> sure that ltsUnify() ends up clobbering the last/leader tape, which is
> a place where BufFileSet is also used, so this is just wrong. You
> should rethink the shmem structure a little bit.
>
>
Fixed this by adding a SharedFileSet directly into the Sharedsort struct.
Thanks Thomas Munro for the offline help here.
* There is still that FIXME comment within leader_takeover_tapes(). I
> believe that you should still have a leader tape (at least in local
> memory in the leader), even though you'll never be able to do anything
> with it, since randomAccess is no longer supported. You can remove the
> FIXME, and just note that you have a leader tape to be consistent with
> the serial case, though recognize that it's not useful. Note that even
> with randomAccess, we always had the leader tape, so it's not that
> different, really.
>
>
Done.
> I suppose it might make sense to make shared->tapes not have a leader
> tape. It hardly matters -- perhaps you should leave it there in order
> to keep the code simple, as you'll be keeping the leader tape in local
> memory, too. (But it still won't fly to continue to clobber it, of
> course -- you still need to find a dedicated place for BufFileSet in
> shared memory.)
>
>
Attaching the latest patch (v13) here and I will continue working on the
comment
improvement part for nbtsort.c and tuplesort.c. Also will perform more
testing
with the attached patch.
Patch is re-base of v25 patch set of Parallel hash.
Thanks,
Rushabh Lathia
www.EnterpriseDB.com
Attachment | Content-Type | Size |
---|---|---|
0001-Add-parallel-B-tree-index-build-sorting_v13.patch | text/x-patch | 150.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2017-11-14 09:51:47 | Re: [HACKERS] Proposal: Local indexes for partitioned table |
Previous Message | Moon Insung | 2017-11-14 09:36:30 | RE: [HACKERS][PATCH]pg_buffercache add a buffer state column, Add fuction to decode buffer state |