Re: Experimenting with hash join prefetch

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Experimenting with hash join prefetch
Date: 2020-02-04 01:31:27
Message-ID: 20200204013127.llyrwlf7e636wm5w@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

HI,

On 2020-02-04 01:48:49 +1300, Thomas Munro wrote:
> On Fri, Apr 12, 2019 at 4:23 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > ... if we also prefetch during
> > the build phase ...
>
> Here's an experimental patch to investigate just that part. I tried
> initiating a prefetch of the bucket just before we copy the tuple and
> then finally insert it, but it doesn't seem to be far enough apart (at
> least for small tuples), which is a shame because that'd be a one line
> patch. So I made a little insert queue that prefetches and defers the
> insertion until N tuples later, and then I managed to get between 10%
> and 20% speed-up for contrived tests like this:

How much of the benefit here comes from the prefetching, and how much
just from writing the code in a manner that allows for more out-of-order
execution? Because there's no dependencies preventing execution of the
next queued tuple anymore, I'd assume that this is a good part what
helps?

Code like this really should look something roughly like:

while (true)
have_skew = False

# get tuples
for i in 0..batchsize:
tuples[i] = ExecProcNode(outerNode);
if tuples[i] == NULL:
# have slowpath handle this
break;

# compute their hash values
for i in 0..batchsize:
hashvalues[i] = ExecHashGetHashValue(tuples[i])

# check whether go into skew buckets
if have_skew_table:
for i in 0..batchsize:
skewbuckets[i] = ExecHashGetSkewBucket(tuples[i], hashvalues[i])
if (skewbuckets[i] != INVALID_SKEW_BUCKET_NO)
have_skew = True
if have_skew:
# handle everything here
continue

# assume there's no skew tuples going forward, all handled above

# compute bucket/batch for all tuples
have_into_batch = False
for i in 0..batchsize:
hashbuckets[i] = ExecHashGetBucketAndBatch()
if hashbuckets[i] != hashtable->curbatch:
have_into_batchfile = True

if have_into_batchfile:
# slowpath
continue

# Allocate all tuples
for i in 0..batchsize:
hjtuples[i] = alloc_mintuple(hashtuples[i])

# And finally isert them
for i in 0..batchsize:
hjtuple.next = buckets[hashbuckets[i]]
buckets[hashbuckets[i]] = hjtuple

Obviously it's a bit more complicated in reality than this, but I do
think that's where we've to go to make crucial parts like this faster
(same with hashaggs, and a few other places).

I would bet this helps significantly even if there's no prefetch
instruction - but using explicit prefetching might help further. Also
allows us to increase branch costs, because we can amortize them over a
few tuples.

> create unlogged table t as select generate_series(1, 100000000)::int i;
> select pg_prewarm('t');
> set work_mem='8GB';
>
> select count(*) from t t1 join t t2 using (i);
>
> master patched/N=1 patched/N=4
> workers=0 89.808s 80.556s (+11%) 76.864 (+16%)
> workers=2 27.010s 22.679s (+19%) 23.503 (+15%)
> workers=4 16.728s 14.896s (+12%) 14.167 (+18%)
>
> Just an early experiment, but I though it looked promising enough to share.

Nice!

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ian Barwick 2020-02-04 01:31:43 Re: Add %x to PROMPT1 and PROMPT2
Previous Message tsunakawa.takay@fujitsu.com 2020-02-04 00:53:44 RE: Complete data erasure