Re: FW: performance issue with a 2.5gb joinded table

From: Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>
To: Daniel Westermann <Daniel(dot)Westermann(at)lcsystems(dot)ch>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: FW: performance issue with a 2.5gb joinded table
Date: 2013-01-09 09:30:05
Message-ID: CAB=Je-H+Ci9JPSza+M1XBker2YbVSeOUE-N2BmBtRnNSTevb0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Daniel,

>>Somehow oracle seems to know that a right join is the better way to go.
In fact, PostgreSQL is just doing the same thing: it hashes smaller table
and scans the bigger one.

Could you please clarify how do you consume 25M rows?
It could be the difference of response times comes not from the PostgreSQL
itself, but from the client code.

Could you please add the following information?
1) Execution time of simple query that selects MAX of all the required
columns "select max(test1.slsales_batch) , max(test1.slsales_checksum),
...".
I mean not explain (analyze, buffers), but simple execution.
The purpose of MAX is to split overhead of consuming of the resultset from
the overhead of producing it.

2) explain (analyze, buffers) for the same query with maxes. That should
reveal the overhead of explain analyze itself.

3) The output of the following SQLPlus script (from Oracle):
set linesize 1000 pagesize 10000 trimout on trimspool on time on timing on
spool slow_query.lst
select /*+ gather_plan_statistics */ max(test1.slsales_batch) ,
max(test1.slsales_checksum), ..;
select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS
LAST'));
spool off

That would display detailed statistics on execution time similar to the
explain (analyze, buffers).

4) Could you please clarify how did you migrate test1 table?
I guess the order of rows in that table might affect overall execution time.
Sorted table would be more CPU cache friendly, thus giving speedup. (see
[1] for similar example).
As far as I understand, simple create table as select * from test1 order by
slsales_date_id, slsales_prod_id should improve cache locality.

[1]:
http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array

--
Regards,
Vladimir Sitnikov

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Alex Vinnik 2013-01-09 15:49:43 Re: Simple join doesn't use index
Previous Message Jeff Janes 2013-01-09 04:34:11 Re: Simple join doesn't use index