Re: Unexpected sequential scan on an indexed column

From: Eddy Escardo-Raffo <eescardo(at)kikini(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Unexpected sequential scan on an indexed column
Date: 2009-11-15 23:59:31
Message-ID: 4eaa4a5e0911151559y1cc43ee8md2d7fe14dfa6f2b1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks, Tom. I had discarded the possibility of data type mismatch already,
which was your first guess, but was wondering if the lopsided distribution
of location values would lead the planner to make a decision that is good on
average but bad for this particular query, as you point out in your second
guess.

I'll try populating the test users with a more evenly distributed location
field, which will be more realistic anyway, and see if that works out
better.

BTW, the -1 is not really a dummy value, but it's just a value that we have
been using in tests for "fake test location ID". I just started performance
measurement for my application and so far had measured performance with
every user being in the same default location and things seemed to be going
well, so I tried to switch a couple users to a different location and see
what happened, and that made performance drop significantly.
(even more detail: my queries also limit results to 10 approx, so DB quickly
found 10 rows that match location -1, but it took a while to discover there
weren't more than 2 rows with the other value).

Thanks!
Eddy

On Sun, Nov 15, 2009 at 3:33 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Eddy Escardo-Raffo <eescardo(at)kikini(dot)com> writes:
> > The table used in this query is called "users", and it has columns
> "userid"
> > (primary key) and "location".
> > The "location" column is indexed.
> > The users table has 1 million rows, and all rows have integer typed value
> > '-1' for "location" column, except for 2 rows that have the integer
> value
> > '76543'.
>
> Oh, after poking at it a bit more, I realize the problem: the planner
> doesn't want to use an indexscan because it assumes there's a
> significant probability that the search will be for -1 (in which case
> the indexscan would be slower than a seqscan, as indeed your results
> prove). Even though it could know in this particular case that the
> comparison value isn't -1, I doubt that teaching it that would help your
> real queries where it will probably be impossible to determine the
> comparison values in advance.
>
> I would suggest considering using NULL rather than inventing a dummy
> value for unknown locations. The estimation heuristics will play a
> lot nicer with that choice.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Eddy Escardo-Raffo 2009-11-16 01:06:26 Re: Unexpected sequential scan on an indexed column
Previous Message Tom Lane 2009-11-15 23:33:56 Re: Unexpected sequential scan on an indexed column