Re: Query optimization

From: David Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Jorge Arevalo <jorgearevalo(at)libregis(dot)org>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Query optimization
Date: 2014-10-29 18:24:24
Message-ID: CAKFQuwZ+Rw+b=by+fOCCy-L0LqssvY9CvQdgeqm-5DFPtjrxmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

List preference is to inline post or, at worse, bottom post. Please do not
top post.

On Wed, Oct 29, 2014 at 11:06 AM, Jorge Arevalo <jorgearevalo(at)libregis(dot)org>
wrote:

> Hello David, many thanks for your responses,
>
> Sorry for not providing the content of the fill_table3_function, but it
> just executes 3 insert queries in 3 different tables. And I've checked the
> time consuming operation is in this query (by the way, there was a little
> mistake in the name of the fields of the inner select, I've corrected it)
>
> SELECT value1,value2,value3,value4,
> value5, hstore(ARRAY['field9', 'field10', 'field11', 'field12', 'field13',
> 'field14'], ARRAY[field9, field10, field11, field12, field13, field14) as
> metadata, value7, (select array((select row(f1, f2) from table2 p where
> p.f3 = field7))) as values_array FROM table1
>
> This is the result of EXPLAIN ANALYZE
>
> QUERY
> PLAN
>
> -------------------------------------------------------------------------------------------------------------------------------------------------
> Index Scan using table1_pkey on table1 (cost=67846.38..395773.45
> rows=8419127 width=88) (actual time=7122.704..22670.680 rows=8419127
> loops=1)
> InitPlan 2 (returns $1)
> -> Result (cost=67846.29..67846.29 rows=1 width=0) (actual
> time=7009.063..7009.065 rows=1 loops=1)
> InitPlan 1 (returns $0)
> -> Seq Scan on table2 p (cost=0.00..67846.29 rows=12689
> width=20) (actual time=14.971..5069.840 rows=2537787 loops=1)
> Filter: (f3 = field7)
>
> So, there's a sequential scan over table2 (in the query to get
> values_array), instead of a index scan. Could it be because the SELECT
> returns more than approximately 5-10% of all rows in the table? (I've heard
> that, under those conditions, a sequential scan is faster than index scan,
> because the amount of I/O operations required for each row)
>

​What concerns me here is that the rows estimate on that table2 scan is
~13k while the actual count is ~ 2.5​M; you need to run ANALYZE on both
tables and see if your get similar results. Though given the need for
sequential scan regardless (see next comment) the estimate miss likely
doesn't affect actual performance or the plan that is chosen. But it is
still worth looking into.

>
> Anyway, if I understood well, I should try:
>
> - Avoiding that inner query by using a JOIN instead
>

I don't know...looking at your explain (and some reasoning) it looks as if
it is already doing that for you since there is only a single loop for the
InitPlan 1. This is a little beyond my comfort zone but you've now
provided a decent amount of information for others to speculate...though it
would help to enable various timings as well and try and run the full query
(with the function) in a development environment so that the entire routine
can be evaluated.​

> - Return a composite type instead of an array
>
>
​Worth looking into but impossible to recommend without knowing what your
make believe fields are and are used for. More style than performance
since I do not know the relative costs of building up an array and creating
a composite.


In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2014-10-29 18:53:18 Re: Query optimization
Previous Message Jorge Arevalo 2014-10-29 18:06:03 Re: Query optimization