Re: After VACUUM, statistics become skewed

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert(dot)Farrugia(at)go(dot)com(dot)mt
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: After VACUUM, statistics become skewed
Date: 2003-05-29 14:26:43
Message-ID: 17482.1054218403@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Robert(dot)Farrugia(at)go(dot)com(dot)mt writes:
> The definitions of the indexes are:

> mo_200302_calling_idx uses (answertime::timestamp,
> callingnumber_type::char(1), callingnumber_value::varchar) in this order
> mo_200302_called_idx uses (answertime::timestamp,
> callednumber_type::char(1), callednumber_value::varchar) in this order

> If I am understanding you well, the planner may ignore the index if the OR
> clause uses the second or third column of a multi-column index, even
> though the first column is used. Is there a way to force the planner to
> use a particular index ?

In this particular case the problem is that the planner does not believe
the index is applicable. The only way to make it think that is to
change the query or the index definition so that the OR clause (i.e.,
the IN) contains references to the first index column. For example, you
could rewrite your query from its current effective form:

select ... where
answertime between '2003/2/3 00:00:00' and '2003/2/14 23:59:59'
and dialleddigits_value = '50043992'
and (callednumber_type = 'P' or callednumber_type = 'M')

to something like

select ... where
(answertime between '2003/2/3 00:00:00' and '2003/2/14 23:59:59'
and dialleddigits_value = '50043992'
and callednumber_type = 'P')
OR
(answertime between '2003/2/3 00:00:00' and '2003/2/14 23:59:59'
and dialleddigits_value = '50043992'
and callednumber_type = 'M')

which can be used for two indexscans on the first two columns of
mo_200302_called_idx. (BTW, if this is a frequent query, maybe you
should have an index that includes dialleddigits_value as the third
column?) You could factor out "and dialleddigits_value = '50043992'"
or not here, that doesn't matter; but the OR'd-together subclauses
have to all mention the leading term of the index, ie, answertime.

Plan B would be to leave the query alone and change the index column
order so that callednumber_type is first. This is probably not a great
idea though, since answertime is a much better first key for most
purposes.

It's moderately annoying that the planner is not bright enough to figure
out this transformation for itself. I suspect what is needed is a whole
fresh look at the OR-index processing algorithm; it seems excessively
complicated and yet it still doesn't do everything one would want.

regards, tom lane

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Robert Treat 2003-05-29 14:33:41 Re: Identifying databases.
Previous Message pginfo 2003-05-29 14:06:05 URGENT: where to find info about the pg structure