From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Viktor Leis <leis(at)in(dot)tum(dot)de> |
Cc: | Craig Ringer <craig(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Experimental evaluation of PostgreSQL's query optimizer |
Date: | 2015-12-22 09:09:39 |
Message-ID: | CAFj8pRBDaZOyAQRymgW9YXZyhrXMeQMK00ME2aL5FNH-dJ9fxQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2015-12-22 9:28 GMT+01:00 Viktor Leis <leis(at)in(dot)tum(dot)de>:
> On 12/22/2015 02:40 AM, Craig Ringer wrote:
> > On 21 December 2015 at 23:57, Viktor Leis <leis(at)in(dot)tum(dot)de <mailto:
> leis(at)in(dot)tum(dot)de>> wrote:
> >
> >
> >
> > Please have a look at Figure 6 (page 6) in
> > http://www.vldb.org/pvldb/vol9/p204-leis.pdf Disabling nested loop
> > joins without index scan (going from (a) to (b)) results in great
> > improvements across the board. And even more importantly, it avoids
> > most of the cases where queries took unreasonably long and timed
> > out. Basically this amounts to the being able to run the query on
> > PostgreSQL or not.
> >
> >
> > For that data, yes. But you're ignoring other important cases. Small or
> even 1-element lookup tables can be one where a nestloop over a seqscan
> turns out to be by far the fastest way to do the job.
> > This can really add up if it's deep in a subplan that's excuted
> repeatedly, or if it's part of queries that get run very frequently on a
> busy OLTP system.
> Ok here's what I presume to be the extreme case: Joining a large table
> with a 1-entry table.
>
> create table r1 (a int not null);
> create table r2 (b int not null);
> insert into r1 select 1 from generate_series(1,1000000);
> insert into r2 values (1);
> analyze r1;
> analyze r2;
>
> set enable_mergejoin to off;
> set enable_nestloop to on;
> set enable_hashjoin to off;
> explain select count(*) from r1, r2 where r1.a = r2.b;
> \timing
> select count(*) from r1, r2 where r1.a = r2.b;
> \timing
>
> set enable_nestloop to off;
> set enable_hashjoin to on;
> explain select count(*) from r1, r2 where r1.a = r2.b;
> \timing
> select count(*) from r1, r2 where r1.a = r2.b;
> \timing
>
> I get 128.894ms vs. 183.724ms, i.e., a 43% slowdown for the hash
> join. However, let me stress that this is really the extreme case:
>
> - If the join has few matches (due to inserting a value different from
> 1 into r2), hash and nested loop join have pretty much the same
> performance.
>
> - If you add just one more row to r2, the hash join is faster by a
> similar margin.
>
> - Also if there is disk IO or network involved, I suspect that you
> will see no performance differences.
>
> There are many difficult tradeoffs in any query optimizer, but I do
> not think picking nested loops where a hash join can be used is one of
> those. To me this seems more like a self-inflicted wound.
>
this is oversimplification :( Probably correct in OLAP, but wrong in OLTP.
The seq scan enforced by hash join can be problematic.
Regards
Pavel
> --
> Viktor Leis
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2015-12-22 09:14:11 | Re: parallel joins, and better parallel explain |
Previous Message | Pavel Stehule | 2015-12-22 08:59:07 | Re: plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types |