From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP: bloom filter in Hash Joins with batches |
Date: | 2016-01-10 00:08:03 |
Message-ID: | CAM3SWZQkrQZTKvXkcGsBDginH+KODsuPXAhqhOW5zCKZBTtCTQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Jan 9, 2016 at 11:02 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> So, this seems to bring reasonable speedup, as long as the selectivity is
> below 50%, and the data set is sufficiently large.
What about semijoins? Apparently they can use bloom filters
particularly effectively. Have you considered them as a special case?
Also, have you considered Hash join conditions with multiple
attributes as a special case? I'm thinking of cases like this:
regression=# set enable_mergejoin = off;
SET
regression=# explain analyze select * from tenk1 o join tenk2 t on
o.twenty = t.twenty and t.hundred = o.hundred;
QUERY PLAN
──────────────────────────────────────────────────────────────────────
Hash Join (cost=595.00..4103.00 rows=50000 width=488) (actual
time=12.086..1026.194 rows=1000000 loops=1)
Hash Cond: ((o.twenty = t.twenty) AND (o.hundred = t.hundred))
-> Seq Scan on tenk1 o (cost=0.00..458.00 rows=10000 width=244)
(actual time=0.017..4.212 rows=10000 loops=1)
-> Hash (cost=445.00..445.00 rows=10000 width=244) (actual
time=12.023..12.023 rows=10000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 2824kB
-> Seq Scan on tenk2 t (cost=0.00..445.00 rows=10000
width=244) (actual time=0.006..3.453 rows=10000 loops=1)
Planning time: 0.567 ms
Execution time: 1116.094 ms
(8 rows)
(Note that while the optimizer has a slight preference for a merge
join in this case, the plan I show here is a bit faster on my
machine).
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Marko Tiikkaja | 2016-01-10 02:22:00 | Re: Add numeric_trim(numeric) |
Previous Message | Michael Paquier | 2016-01-09 23:10:00 | Re: [COMMITTERS] pgsql: Blind attempt at a Cygwin fix |