Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.

From: Emil Iggland <emil(dot)iggland(at)metrima(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.
Date: 2022-04-22 14:53:48
Message-ID: 4eacd5f7-1f8d-1e0d-ad9f-61f6c5ab706e@metrima.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Dear mailing list

We are investigating a strange performance issue with our database.

Our use case is a sensor reading database where we have sensor location
(called channels), parameter settings (called valueseries) and reading
(called datavalues). Datavalues is partitioned per month.
Like this: channel <-(FK)- valueseries <-(FK)- datavalues

We have a query that returns the latest sensor reading. When we have no
readings in datavalues the query is SLOW, when we have 1 or more
readings in datavalues the query is FAST. (slow being ~1second, fast
~5ms) It isn't the slowness that is the main problem, but rather the odd
behaviour.

The query that is giving us issues is the following, channel 752433 has
NO values, 752431 has values.
(Channel 752433 only has valueseries 752434)

select * from datavalue
where dataview in ( select id from valueseries where channel =
%channel_idx%)
ORDER BY VALUETIMESTAMP DESC
FETCH FIRST ROW only;

Running explain analyze shows strange numbers, 52'000 rows are being
returned but there are no rows there.

For channel 752433
-> Index Scan Backward using
datavalue_2022_03_valuetimestamp_dataview_idx on datavalue_2022_03
datavalue_6 (cost=0.42..7166.19 rows=119673 width=226) (actual
time=0.008..32.831 rows=119601 loops=1)
-> Index Scan Backward using
datavalue_2022_04_valuetimestamp_dataview_idx on datavalue_2022_04
datavalue_7 (cost=0.29..4002.79 rows=52499 width=227) (actual
time=0.011..15.005 rows=52499 loops=1)

For channel 752431
-> Index Scan Backward using
datavalue_2022_03_valuetimestamp_dataview_idx on datavalue_2022_03
datavalue_6 (cost=0.42..7166.19 rows=119673 width=226) (actual
time=0.008..0.008 rows=1 loops=1)
-> Index Scan Backward using
datavalue_2022_04_valuetimestamp_dataview_idx on datavalue_2022_04
datavalue_7 (cost=0.29..4002.79 rows=52499 width=227) (actual
time=0.011..0.011 rows=1 loops=1)

Inserting even a single row changes the offending rows to the expected
values:

insert into maclient.datavalue (dataview, valuetimestamp, datavalue)
values (752434, '2022-03-01 00:00:00', 234);
-> Index Scan Backward using
datavalue_2022_03_valuetimestamp_dataview_idx on datavalue_2022_03
datavalue_6 (cost=0.42..7166.19 rows=119673 width=226) (actual
time=0.006..0.006 rows=1 loops=1)

Full explain analyze on https://paste.depesz.com/s/ZwJ
with buffers and track_io_timing: https://paste.depesz.com/s/Ss

Disabling indexscan (set enable_indexscan=false;) hides the problem, it
does not show up with a bitmap index scan.
Running autovacuum analyze doesn't seem to help, the results are the same.

SELECT version();
"PostgreSQL 14.2, compiled by Visual C++ build 1914, 64-bit"

Can anyone explain what is going on here.
* Why is the database returning 52'000+ rows when it should be returning 0?
* Is my query badly formulated?
* Is there something wrong with the indexes and I need to rebuild them?

We are stumped, and would greatly appreciate any input.

Regards
Emil

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2022-04-25 16:00:21 Re: Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.
Previous Message Justin Pryzby 2022-04-21 11:52:36 Re: Query Planner not taking advantage of HASH PARTITION