Re: index does not improve performance

From: Jason Earl <jason(dot)earl(at)simplot(dot)com>
To: Milos Prudek <milos(dot)prudek(at)tiscali(dot)cz>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: index does not improve performance
Date: 2002-01-31 20:17:00
Message-ID: 1012508221.24959.209.camel@npa01zz001
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Unfortunately Milos an index isn't likely to help on this type of a
query. It would appear that the value 'UDP' accounts for the vast
majority of the rows, and so an index loses most of its value. You see,
it actually takes *longer* to return queries using the indexes if a
significant portion of the table is being touched, because the database
has to check both the index and the tuple. One of the new features of
7.2 is better statitistics gathering so that PostgreSQL can opt
*against* using an index scan in precisely this case. You basically
want to return the entire table, so an indexscan only adds to the query
time.

For example, your query where you searched for 'udp' returned very
quickly, because PostgreSQL was able to use the index to verify that the
value 'udp' didn't exist. When you searched for 'UDP' PostgreSQL
probably used an index scan as well, and so your query took longer than
before you added the index (that's overhead of actually looking at the
index instead of just getting to business and scanning the table).

In short, if your query only matches a small percentage of the records
then an indexscan is a win. Otherwise it's just an extra step.

Jason

On Thu, 2002-01-31 at 12:25, Milos Prudek wrote:
> Hi all,
>
> I have a table with 253.380 records. It's a firewall log. I thought that
> creating an index will improve queries but results so far were
> disappointing.
>
> The table has about 20 columns, most of them of type "text". There's a
> text field "ip_type" that has one of three values: TCP, UDP, ICMP. I
> tried to do two selects with and without an index on "ip_type".
>
>
> Without index:
>
> "select count(*) from log where ip_type='udp';" takes 3.0 seconds (this
> query evaluates to zero rows).
>
> "select count(*) from log where ip_type='UDP';" takes 4.5 seconds (this
> query evaluates to 245.182 rows).
>
>
>
> With index:
>
> "select count(*) from log where ip_type='udp';" takes 0.0 seconds.
>
> "select count(*) from log where ip_type='UDP';" takes 5.0 seconds.
>
>
> It looks like creating an index degrades performance if the result set
> is similar to the size of the whole table (I had much better results
> when the condition was met by only two thousand records). Is this
> normal?
> --
> Milos Prudek

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gregory Wood 2002-01-31 20:18:29 Re: Drop Foreign Key
Previous Message Darren Ferguson 2002-01-31 20:14:41 Re: unique & update