From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Experimenting with hash join prefetch |
Date: | 2020-02-04 01:31:27 |
Message-ID: | 20200204013127.llyrwlf7e636wm5w@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
HI,
On 2020-02-04 01:48:49 +1300, Thomas Munro wrote:
> On Fri, Apr 12, 2019 at 4:23 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > ... if we also prefetch during
> > the build phase ...
>
> Here's an experimental patch to investigate just that part. I tried
> initiating a prefetch of the bucket just before we copy the tuple and
> then finally insert it, but it doesn't seem to be far enough apart (at
> least for small tuples), which is a shame because that'd be a one line
> patch. So I made a little insert queue that prefetches and defers the
> insertion until N tuples later, and then I managed to get between 10%
> and 20% speed-up for contrived tests like this:
How much of the benefit here comes from the prefetching, and how much
just from writing the code in a manner that allows for more out-of-order
execution? Because there's no dependencies preventing execution of the
next queued tuple anymore, I'd assume that this is a good part what
helps?
Code like this really should look something roughly like:
while (true)
have_skew = False
# get tuples
for i in 0..batchsize:
tuples[i] = ExecProcNode(outerNode);
if tuples[i] == NULL:
# have slowpath handle this
break;
# compute their hash values
for i in 0..batchsize:
hashvalues[i] = ExecHashGetHashValue(tuples[i])
# check whether go into skew buckets
if have_skew_table:
for i in 0..batchsize:
skewbuckets[i] = ExecHashGetSkewBucket(tuples[i], hashvalues[i])
if (skewbuckets[i] != INVALID_SKEW_BUCKET_NO)
have_skew = True
if have_skew:
# handle everything here
continue
# assume there's no skew tuples going forward, all handled above
# compute bucket/batch for all tuples
have_into_batch = False
for i in 0..batchsize:
hashbuckets[i] = ExecHashGetBucketAndBatch()
if hashbuckets[i] != hashtable->curbatch:
have_into_batchfile = True
if have_into_batchfile:
# slowpath
continue
# Allocate all tuples
for i in 0..batchsize:
hjtuples[i] = alloc_mintuple(hashtuples[i])
# And finally isert them
for i in 0..batchsize:
hjtuple.next = buckets[hashbuckets[i]]
buckets[hashbuckets[i]] = hjtuple
Obviously it's a bit more complicated in reality than this, but I do
think that's where we've to go to make crucial parts like this faster
(same with hashaggs, and a few other places).
I would bet this helps significantly even if there's no prefetch
instruction - but using explicit prefetching might help further. Also
allows us to increase branch costs, because we can amortize them over a
few tuples.
> create unlogged table t as select generate_series(1, 100000000)::int i;
> select pg_prewarm('t');
> set work_mem='8GB';
>
> select count(*) from t t1 join t t2 using (i);
>
> master patched/N=1 patched/N=4
> workers=0 89.808s 80.556s (+11%) 76.864 (+16%)
> workers=2 27.010s 22.679s (+19%) 23.503 (+15%)
> workers=4 16.728s 14.896s (+12%) 14.167 (+18%)
>
> Just an early experiment, but I though it looked promising enough to share.
Nice!
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Ian Barwick | 2020-02-04 01:31:43 | Re: Add %x to PROMPT1 and PROMPT2 |
Previous Message | tsunakawa.takay@fujitsu.com | 2020-02-04 00:53:44 | RE: Complete data erasure |