Re: Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.

From: Emil Iggland <emil(dot)iggland(at)metrima(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.
Date: 2022-04-26 07:41:52
Message-ID: d88d4654-3a13-e76d-269d-5e6bb780d4d2@metrima.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> You've got the wrong column order (for this query anyway) in that
> index. It'd work a lot better if dataview were the first column;
I might be misunderstanding you, but I assume that you are suggesting an
index on (dataview, valuetimestamp).
We have that index, it is the primary key. For some reason it isn't
being selected.

I can understand that it has to go through the whole index, potentially
even the whole table, but I do not why it takes so long.

Even a query that should take equally long (probably longer) is
substantially faster:

explain (analyze, buffers)
select valuetimestamp from datavalue
where valuetimestamp <> '1965-01-07 05:50:59';

Completes in less than 500ms using a sequential scan,

...
-> Seq Scan on datavalue_2022_04 datavalue_7 (cost=0.00..1450.39
rows=56339 width=8) (actual time=0.013..5.988 rows=56109 loops=1)"
Filter: (valuetimestamp <> '1965-01-07 05:50:59'::timestamp without
time zone)
Buffers: shared hit=742 read=4
...
Planning Time: 0.781 ms
Execution Time: 394.408 ms

while the original query takes over 1 second.
...
-> Index Scan Backward using
datavalue_2022_04_valuetimestamp_dataview_idx on datavalue_2022_04
datavalue_7 (cost=0.29..4292.48 rows=56351 width=227) (actual
time=0.166..17.340 rows=56109 loops=1)
Buffers: shared hit=42013 read=278
...
Planning Time: 0.964 ms
Execution Time: 1291.509 ms

I do not understand how looking at every value in the index and
returning none be slower than looking at every table in the table and
returning none. If it takes 500ms to return every value in the table via
a sequential scan, then it should take less via an index scan.

In case we never solve it, and someone else runs into similiar problems,
we (hopefully temporarily) worked around it by reformulating the query
to use a lateral join:

EXPLAIN (analyze, buffers)
SELECT dv.* FROM valueseries vs
LEFT JOIN LATERAL (
SELECT * FROM datavalue dv WHERE dv.dataview = vs.id
ORDER BY VALUETIMESTAMP
FETCH FIRST 1 ROWS ONLY
) dv ON TRUE
where vs.channel = 752433

This causes it to use the correct index:
-> Index Scan using datavalue_2022_01_pkey on datavalue_2022_01 dv_4
(cost=0.42..2951.17 rows=1032 width=228) (actual time=0.034..0.034
rows=0 loops=1)
Index Cond: (dataview = vs.id)
Buffers: shared read=3
...
Planning Time: 1.169 ms
Execution Time: 0.524 ms

Regards
Emil

On 2022-04-25 18:00, Tom Lane wrote:
> Emil Iggland <emil(dot)iggland(at)metrima(dot)com> writes:
>> The query that is giving us issues is the following, channel 752433 has
>> NO values, 752431 has values.
>> (Channel 752433 only has valueseries 752434)
>
>> select * from datavalue
>> where dataview in ( select id from valueseries where channel =
>> %channel_idx%)
>> ORDER BY VALUETIMESTAMP DESC
>> FETCH FIRST ROW only;
>
>> Running explain analyze shows strange numbers, 52'000 rows are being
>> returned but there are no rows there.
>
>> For channel 752433
>> -> Index Scan Backward using
>> datavalue_2022_03_valuetimestamp_dataview_idx on datavalue_2022_03
>> datavalue_6 (cost=0.42..7166.19 rows=119673 width=226) (actual
>> time=0.008..32.831 rows=119601 loops=1)
>
> You've got the wrong column order (for this query anyway) in that
> index. It'd work a lot better if dataview were the first column;
> or at least, it wouldn't tempt the planner to try this unstably-
> performing plan. It's trying to use the index ordering to satisfy
> the ORDER BY, which works great as long as it finds a dataview
> match in some reasonably recent index entry. Otherwise, it's
> going to crawl the whole index to discover that there's no match.
>
> regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David Rowley 2022-04-27 08:22:12 Re: Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.
Previous Message Tom Lane 2022-04-25 16:00:21 Re: Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.