Re: WIP: [[Parallel] Shared] Hash

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: [[Parallel] Shared] Hash
Date: 2017-02-15 23:22:47
Message-ID: CAEepm=0fStoz2Bgu=zD6efXDYt17cNo-a5a6tWwooh=xrBvZRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Out of archeological curiosity, I was digging around in the hash join
code and RCS history from Postgres 4.2[1], and I was astounded to
discover that it had a parallel executor for Sequent SMP systems and
was capable of parallel hash joins as of 1991. At first glance, it
seems to follow approximately the same design as I propose: share a
hash table and use a barrier to coordinate the switch from build phase
to probe phase and deal with later patches. It uses mmap to get space
and then works with relative pointers. See
src/backend/executor/n_hash.c and src/backend/executor/n_hashjoin.c.
Some of this might be described in Wei Hong's PhD thesis[2] which I
haven't had the pleasure of reading yet.

The parallel support is absent from the first commit in our repo
(1996), but there are some vestiges like RelativeAddr and ABSADDR used
to access the hash table (presumably needlessly) and also some
mentions of parallel machines in comments that survived up until
commit 26069a58 (1999).

[1] http://db.cs.berkeley.edu/postgres.html
[2] http://db.cs.berkeley.edu/papers/ERL-M93-28.pdf

--
Thomas Munro
http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-02-15 23:30:30 Re: bytea_output vs make installcheck
Previous Message Jeff Janes 2017-02-15 23:05:28 Re: gitlab post-mortem: pg_basebackup waiting for checkpoint