Re: Horribly slow hash join

From: Marcos Martínez(R) <mmcosta(at)mundo-r(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: pgsql-performance(at)postgreSQL(dot)org
Subject: Re: Horribly slow hash join
Date: 2004-04-17 20:35:09
Message-ID: 001001c424bb$7fb83740$0564a8c0@lobeira
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I didn't follow the conversation from the begining, bu I imagine that you
could improve
performance using the value (work_units % (2^32) ) instead of work_units.
You could even make an index on this value. Like that, the HASH function
will work well. This is not a good solution, but ...

For example.

create index ind1 on table1 ( work_units % (2^32) );

create index ind1 on table2 ( work_units % (2^32) );

Select * from table1 join table2 on (table1.work_units % (2^32) ) =
(table2.work_units % (2^32) )

----- Original Message -----
From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jim C. Nasby" <jim(at)nasby(dot)net>
Cc: <pgsql-performance(at)postgreSQL(dot)org>
Sent: Saturday, April 17, 2004 6:08 PM
Subject: Re: [PERFORM] Horribly slow hash join

> [ resending because I fat-fingered the cc: to the list ]
>
> I see the problem: all the entries in your work_units column have the
> low 32 bits equal to zero.
>
> regression=# select distinct work_units % (2^32)::bigint from
Trank_work_overall;
> ?column?
> ----------
> 0
> (1 row)
>
> The hash function for int8 only takes the low word into account, so all
> of the entries end up on the same hash chain, resulting in worst-case
> behavior. This applies to both your hash join and hash aggregate cases.
>
> We could change the hash function, perhaps, but then we'd just have
> different cases where there's a problem ... hashing will always fail on
> *some* set of inputs. (Also, I have been harboring some notions of
> supporting cross-type hash joins for integer types, which will not work
> unless small int8 values hash the same as int4 etc.)
>
> I guess the real issue is why are you encoding work_units like that?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Stark 2004-04-17 23:04:39 Re: Horribly slow hash join
Previous Message Tom Lane 2004-04-17 16:08:15 Re: Horribly slow hash join