Re: Bad plan on a huge table query

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Daniel Cristian Cruz <danielcristian(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Bad plan on a huge table query
Date: 2013-03-31 19:29:26
Message-ID: CAMkU=1w3LAxjq1SLsdCbhQEP4Y2jUyrQAuOBTvjuwaP8p6E16w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Mar 26, 2013 at 5:08 AM, Daniel Cristian Cruz <
danielcristian(at)gmail(dot)com> wrote:

> Well, I did it:
>
> explain (analyze, buffers)
> select count(*) from turma.aula_confirmacao where
> inicio_aula::DATE BETWEEN DATE_TRUNC('YEAR', CURRENT_TIMESTAMP) AND
> CURRENT_TIMESTAMP; -- changed name because of a conflict in some queries
>
> http://explain.depesz.com/s/Fzr
>

That looks reasonable now. But what changed?

> And just to update, this is the actual query and plan:
>
> ...
>

> http://explain.depesz.com/s/YfXr
>

The part of this complex query that corresponds to that simple query now
looks reasonable, while before it did not. What changed in the mean time?
Whatever it is that changed (a good vacuum analyze?), you probably need to
do the same thing on idx_presenca_3 .

But I see that the difference is not that the estimate moved, but rather
that the number of actual rows moved, to be closer to the estimate. I
don't know what to make of that.

> I guess that, there is something with estudante_periodo, because there is
> 24% with only one row and 50% with 5 or less rows on it:
>

That could be, but I don't really know how to tackle that. So I'm starting
at the bottom level queries that are simple (i.e. just an index scan, no
complex joins or filters) and yet are poorly estimated, because those ones
are tractable. Maybe the horrible estimates at the top are just the
cumulation of bad estimates at the bottom (but maybe they are not--but
still it seems to make sense the tackle the easy ones first)

> After the refactoring, idx_aula_confirmacao_2 became
idx_aula_confirmacao_1:

>
> select * from pg_stats where tablename = 'idx_aula_confirmacao_1';
>

Alas, this information is not so interesting anymore, because now the query
is getting the estimate at this step pretty close. It is back when the
estimate was so horrible that this info would be interesting.

> -[ RECORD 1
> ]----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> schemaname | turma
> tablename | idx_aula_confirmacao_1
> attname | inicio_aula
> inherited | f
> null_frac | 0.996792
>

But, with the null_frac so high, a partial index for "where ... not null"
might be a good idea. I doubt it would make the query change the execution
plan, but it would at least take up less disk space and so be more
cacheable.

Cheers,

Jeff

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tore Halvorsen 2013-03-31 20:15:58 Problem with pg_basebackup and streaming replication. (9.2.3 / win64)
Previous Message Misa Simic 2013-03-31 17:24:27 Re: Money casting too liberal?