Re: will the planner ever use an index when the condition is <> ?

From: Filip Rembiałkowski <plk(dot)zuber(at)gmail(dot)com>
To: Roxanne Reid-Bennett <rox(at)tara-lu(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: will the planner ever use an index when the condition is <> ?
Date: 2011-12-17 16:24:15
Message-ID: CAP_rwwmg8ELD3YQgMyUzkXrrRy0GsgqdPZKYUFb-o6=X3YTgNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Normally there is no chance it could work,
because (a) the planner does not know all possible values of a column,
and (b) btree indexes cannot search on "not equal" operator.

BTW I've just made a case where - logically - it could work, but it
still does not:

create table nums ( num int4 not null, check(num=1 or num=2) );
insert into nums select case when random()<=0.99 then 1 else 2 end
from generate_series(1,1000000);
create index nums_idx on nums(num);
analyze nums;
set constraint_exclusion to 'on';
explain select * from nums where num<>1;
--planner could estimate selectivity as 1%, and use index with "=2"
filter basing on check constraint?

2011/12/17 Roxanne Reid-Bennett <rox(at)tara-lu(dot)com>:
> I have a query that used <> against an indexed column. In this
> case I can use the reverse and use in or = and get the performance
> I need... but "in general"... will the planner ever use an index when the
> related column is compared using <>?
>
> I feel like the answer is no, but wanted to ask.
>
> Roxanne
> Postgres Version 8.4.9 PostGIS version 1.5.2
>
>
>
> Context for question:
>
> I have the following query:
>
> select *
> from op_region opr, yield_segment_info ysi, data_location dl
> where opr.op_region_id in
>         (select distinct op_region_id
>          from yield_point
>          where yield > 0
>          and area > 0
>          and ST_GeometryType(location) <> 'ST_Point'
>         )
> and ysi.op_region_id = opr.op_region_id
> and dl.data_set_id = opr.data_set_id
>
> Yield_Point has 161,575,599 records
> where yield >0 and area > 0 has 161,263,193 records,
> where ST_GeometryType(location)<> 'ST_Point' has just 231 records
>
> yield_segment_info has 165,929 records
> op_region has 566,212 records
> data_location has 394,763
>
> All of these have a high volume of insert/delete's.
> The tables have recently been vacuum full'd and the indexes reindexed.
> [they are under the management of the autovacuum, but we forced a cleanup on
> the chance that things had degraded...]
>
> If I run an explain analyze:
>
> "Nested Loop
>    (cost=5068203.00..5068230.31 rows=3 width=225308)
>    (actual time=192571.730..193625.728 rows=236 loops=1)"
> "->Nested Loop
>      (cost=5068203.00..5068219.66 rows=1 width=57329)
>      (actual time=192522.573..192786.698 rows=230 loops=1)"
> "  ->Nested Loop
>        (cost=5068203.00..5068211.36 rows=1 width=57268)
>        (actual time=192509.822..192638.446 rows=230 loops=1)"
> "    ->HashAggregate
>           (cost=5068203.00..5068203.01 rows=1 width=4)
>           (actual time=192471.507..192471.682 rows=230 loops=1)"
> "       ->Seq Scan on yield_point
>              (cost=0.00..5068203.00 rows=1 width=4)
>              (actual time=602.174..192471.177 rows=230 loops=1)"
> "             Filter: ((yield > 0::double precision) AND
>                       (area > 0::double precision) AND
>                       (st_geometrytype(location) <> 'ST_Point'::text))"
> "    ->Index Scan using op_region_pkey on op_region opr
>           (cost=0.00..8.33 rows=1 width=57264)
>           (actual time=0.723..0.723 rows=1 loops=230)"
> "          Index Cond: (opr.op_region_id = yield_point.op_region_id)"
> "  ->Index Scan using yield_segment_info_key on yield_segment_info ysi
>       (cost=0.00..8.29 rows=1 width=65)
>       (actual time=0.643..0.643 rows=1 loops=230)"
> "      Index Cond: (ysi.op_region_id = opr.op_region_id)"
> "->Index Scan using data_location_data_set_idx on data_location dl
>    (cost=0.00..10.61 rows=3 width=167979)
>    (actual time=3.611..3.646 rows=1 loops=230)"
> "Index Cond: (dl.data_set_id = opr.data_set_id)"
> "Total runtime: 193625.955 ms"
>
> yield_point has the following indexes:
>      btree on ST_GeometryType(location)
>      gist on location
>      btree on op_region_id
>
> I've also tried an index on
>      ((yield > 0::double precision) AND (area > 0::double precision) AND
> (st_geometrytype(location) <> 'ST_Point'::text))
> ... it still goes for the sequential scan.
>
> But if I change it to st_geometrytype(location) = 'ST_Polygon' or
> even in ('ST_Polygon','ST_MultiPolygon')
>
> the planner uses the index.
>
> Roxanne
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Віталій Тимчишин 2011-12-18 10:41:21 Re: will the planner ever use an index when the condition is <> ?
Previous Message Roxanne Reid-Bennett 2011-12-17 15:30:07 will the planner ever use an index when the condition is <> ?