Odd behavior with indices

From: joe meiring <josephmeiring(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Odd behavior with indices
Date: 2016-02-26 19:43:42
Message-ID: CAHd-1_4RUDmOLNOOJ6yKsZQEeHPKPHeAt7igOZLRiGK8WfF11Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Also available on S.O.:

http://stackoverflow.com/questions/35658238/postgres-odd-behavior-with-indices

I've got a datavalue table with ~200M rows or so, with indices on both
site_id and parameter_id. I need to execute queries like "return all sites
with data" and "return all parameters with data". The site table has only
200 rows or so, and the parameter table has only 100 or so rows.

The site query is fast and uses the index:

EXPLAIN ANALYZEselect *from sitewhere exists (
select 1 from datavalue
where datavalue.site_id = site.id limit 1);

Seq Scan on site (cost=0.00..64.47 rows=64 width=113) (actual
time=0.046..1.106 rows=89 loops=1)
Filter: (SubPlan 1)
Rows Removed by Filter: 39
SubPlan 1
-> Limit (cost=0.44..0.47 rows=1 width=0) (actual
time=0.008..0.008 rows=1 loops=128)
-> Index Only Scan using ix_datavalue_site_id on datavalue
(cost=0.44..8142.71 rows=248930 width=0) (actual time=0.008..0.008
rows=1 loops=128)
Index Cond: (site_id = site.id)
Heap Fetches: 0
Planning time: 0.361 ms
Execution time: 1.149 ms

The same query for parameters is rather slow and does NOT use the index:

EXPLAIN ANALYZEselect *from parameterwhere exists (
select 1 from datavalue
where datavalue.parameter_id = parameter.id limit 1);

Seq Scan on parameter (cost=0.00..20.50 rows=15 width=2648) (actual
time=2895.972..21331.701 rows=15 loops=1)
Filter: (SubPlan 1)
Rows Removed by Filter: 6
SubPlan 1
-> Limit (cost=0.00..0.34 rows=1 width=0) (actual
time=1015.790..1015.790 rows=1 loops=21)
-> Seq Scan on datavalue (cost=0.00..502127.10
rows=1476987 width=0) (actual time=1015.786..1015.786 rows=1 loops=21)
Filter: (parameter_id = parameter.id)
Rows Removed by Filter: 7739355
Planning time: 0.123 ms
Execution time: 21331.736 ms

What the deuce is going on here? Alternatively, whats a good way to do this?

Any help/guidance appreciated!

Some of the table description:

\d datavalue

id BIGINT DEFAULT nextval('datavalue_id_seq'::regclass) NOT NULL,
value DOUBLE PRECISION NOT NULL,
site_id INTEGER NOT NULL,
parameter_id INTEGER NOT NULL,
deployment_id INTEGER,
instrument_id INTEGER,
invalid BOOLEAN,
Indexes:
"datavalue_pkey" PRIMARY KEY, btree (id)
"datavalue_datetime_utc_site_id_parameter_id_instrument_id_key"
UNIQUE CONSTRAINT, btree (datetime_utc, site_id, parameter_id,
instrument_id)
"ix_datavalue_instrument_id" btree (instrument_id)
"ix_datavalue_parameter_id" btree (parameter_id)
"ix_datavalue_site_id" btree (site_id)
"tmp_idx" btree (site_id, datetime_utc)
Foreign-key constraints:
"datavalue_instrument_id_fkey" FOREIGN KEY (instrument_id)
REFERENCES instrument(id) ON UPDATE CASCADE ON DELETE CASCADE
"datavalue_parameter_id_fkey" FOREIGN KEY (parameter_id)
REFERENCES parameter(id) ON UPDATE CASCADE ON DELETE CASCADE
"datavalue_site_id_fkey" FOREIGN KEY (site_id) REFERENCES
coastal.site(id) ON UPDATE CASCADE ON DELETE CASCADE
"datavalue_statistic_type_id_fkey"

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David G. Johnston 2016-02-26 20:02:30 Re: Odd behavior with indices
Previous Message Wes Vaske (wvaske) 2016-02-24 15:44:59 Re: Filesystem and Disk Partitioning for New Server Setup