Re: Performance of query

From: Cindy Makarowsky <cindymakarowsky(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Misa Simic <misa(dot)simic(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Performance of query
Date: 2013-03-25 18:55:25
Message-ID: CAM_v1L22q-0aM00XcA_Bt6Wizeg7bw3cGyxNWjQtoDcBr2ECGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I basically don't have any control over the generated select statement.
I'm using Mondrian and that is the select statement that gets passed to
Postgres. You're right that if you remove the count(id), the query is
faster but I can't do that since the select statement is being executed
from Mondrian.

On Mon, Mar 25, 2013 at 2:18 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Sat, Mar 23, 2013 at 3:27 PM, Misa Simic <misa(dot)simic(at)gmail(dot)com> wrote:
>
>> Hi Jeff,
>>
>> It seems my previous mail has not showed up in the list... copied/pasted
>> again belloew
>>
>> However, you said something important:
>>
>> "The join to the "state" table is not necessary. Between the foreign
>> key and the primary key, you know that every state exists, and that every
>> state exists only once. But, that will not solve your problem, as the join
>> to the state table is not where the time goes."
>>
>> I think it is something what planner could/should be "aware off"... and
>> discard the join
>>
>
> I thought that this was on the To Do list (
> http://wiki.postgresql.org/wiki/Todo) but if it is, I can't find it.
>
> I think the main concern was that it might add substantial planning time
> to all queries, even ones that would not benefit from it. I don't know if
> there is a way to address this concern, other then to implement it and see
> what happens.
>
> ...
>
>>
>> EXPLAIN ANALYZE
>> SELECT busbase.state AS c0, count(busbase.id) AS m0 FROM busbase INNER
>> JOIN state USING (state)
>> GROUP BY busbase.state
>>
>
> In the original email, the table definition listed "id" twice, once with a
> not null constraint. If it is truly not null, then this count could be
> replaced with count(1), in which case the original index on (state) would
> be sufficient, the composite on (count, id) would not be necessary. (Yes,
> this is another thing the planner could, in theory, recognize on your
> behalf)
>
> Based on the use of column aliases which are less meaningful than the
> original column names were, I'm assuming that this is generated SQL that
> you have no control over?
>
>
>> on created composite index
>> CREATE INDEX comp_statidx2
>> ON busbase
>> USING btree
>> (state, id );
>>
>
>
>
>
>>
>>
>> we got:
>>
>> "GroupAggregate (cost=0.00..2610570.81 rows=51 width=3) (actual
>> time=98.923..51033.888 rows=51 loops=1)"
>> " -> Merge Join (cost=0.00..2310285.02 rows=60057056 width=3) (actual
>> time=38.424..41992.070 rows=60057057 loops=1)"
>> " Merge Cond: (state.state = busbase.state)"
>> " -> Index Only Scan using state_pkey on state (cost=0.00..13.02
>> rows=51 width=3) (actual time=0.008..0.148 rows=51 loops=1)"
>> " Heap Fetches: 51"
>> " -> Index Only Scan using comp_statidx2 on busbase
>> (cost=0.00..1559558.68 rows=60057056 width=3) (actual
>> time=38.408..12883.575 rows=60057057 loops=1)"
>> " Heap Fetches: 0"
>> "Total runtime: 51045.648 ms"
>>
>>
> I don't understand why you are getting a merge join rather than a hash
> join. Nor why there is such a big difference between the actual time of
> the index only scan and of the merge join itself. I would think the two
> should be about equal. Perhaps I just don't understand the semantics of
> reported actual time for merge joins.
>
> During normal operations, how much of busbase is going to be all_visible
> at any given time? If that table sees high turnover, this plan might not
> work well on the production system.
>
>
> Cheers,
>
> Jeff
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Jaskiewicz 2013-03-25 20:08:39 Proof of concept: Evolving postgresql.conf using genetic algorithm
Previous Message Jeff Janes 2013-03-25 18:18:28 Re: Performance of query