Re: Query optimization

From: Jorge Arevalo <jorgearevalo(at)libregis(dot)org>
To: David Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Query optimization
Date: 2014-10-29 19:14:01
Message-ID: CAMhtMNMA2_usWDiDCo89qJfb4iW43rOYTLjdqh7qS60w9=2ZpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Oct 29, 2014 at 8:05 PM, David Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
wrote:

> On Wed, Oct 29, 2014 at 11:53 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Jorge Arevalo <jorgearevalo(at)libregis(dot)org> writes:
>>
>> > This is the result of EXPLAIN ANALYZE
>>
>> > QUERY
>> > PLAN
>> >
>> -------------------------------------------------------------------------------------------------------------------------------------------------
>> > Index Scan using table1_pkey on table1 (cost=67846.38..395773.45
>> > rows=8419127 width=88) (actual time=7122.704..22670.680 rows=8419127
>> > loops=1)
>> > InitPlan 2 (returns $1)
>> > -> Result (cost=67846.29..67846.29 rows=1 width=0) (actual
>> > time=7009.063..7009.065 rows=1 loops=1)
>> > InitPlan 1 (returns $0)
>> > -> Seq Scan on table2 p (cost=0.00..67846.29 rows=12689
>> > width=20) (actual time=14.971..5069.840 rows=2537787 loops=1)
>> > Filter: (f3 = field7)
>>
>> Hm. If I'm reading that right, you're building an array containing
>> 2537787 entries, each of which is a composite datum containing two
>> columns of unmentioned datatypes. I suspect a big chunk of your
>> runtime is going into manipulating that array -- PG is not terribly
>> efficient with big arrays containing variable-width values.
>>
>>
The seq scan over table2 is for finding entries in table2 (which contains
2537787) that matches a condition using a column from table1 (entries that
match table1.field7 = table2.f3). But the array isn't going to contain all
the entries, Just a few of them.

I think the time is being used in scanning table2 for all the rows of
table1 (plus than 8 million).

> I'm also a bit confused as to why the planner is saying that the (SELECT
>> ARRAY(...)) bit is an InitPlan and not a SubPlan. That implies that
>> "field7" in the innermost WHERE clause is not a reference to table1 but a
>> reference to table2. Is that really what you meant? IOW, are you sure
>> this query is performing the right calculation in the first place?
>>
>>
> I thought the InitPlan was in place because the planner choose to execute
> the correlated subquery as a standalone query since it realizes that it is
> going to have to end up processing the entire table anyway due to the lack
> of a filter on the outer query. In effect executing "table1 JOIN (table2
> subquery) ON (f3 = field7)"​.
>
> David J.
>
>
Yes, for each row of table1, table2 is being scanned, to find all the
entries that satisfy table1.field7 = table2.f3. Sounds that a really heavy
task. I guess I should avoid it, right?

BTW, Tom, this is the query with all the parentheses/brackets

SELECT value1,value2,value3,value4, value5, hstore(ARRAY['field9',
'field10', 'field11', 'field12', 'field13', 'field14'], ARRAY[field9,
field10, field11, field12, field13, field14]) as metadata, value7, (select
array((select row(f1, f2) from table2 p where p.f3 = field7))) as
values_array FROM table1

Oh, and sorry for the top posting!

--
Jorge Arevalo
Freelance developer

http://about.me/jorgeas80

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2014-10-29 19:39:42 Re: Query optimization
Previous Message David Johnston 2014-10-29 19:05:41 Re: Query optimization