Multicolumn B-Tree index - order by on 1st column and IN lookup for 2nd

From: Alexandru Lazarev <alexandru(dot)lazarev(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Multicolumn B-Tree index - order by on 1st column and IN lookup for 2nd
Date: 2017-05-25 18:29:53
Message-ID: CAL93h0GZtbpLpXYgEu3mQT_YeSFVr+J_j4m++JxzV0FLKa5gPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I created such table (similar to example from
http://use-the-index-luke.com/sql/example-schema/postgresql/performance-testing-scalability
)

CREATE TABLE scale_data (
section NUMERIC NOT NULL,
id1 NUMERIC NOT NULL, -- unique values simulating ID or Timestamp
id2 NUMERIC NOT NULL -- a kind of Type);

Populate it with:

INSERT INTO scale_dataSELECT sections.sections,
sections.sections*10000 + gen.gen
, CEIL(RANDOM()*100)
FROM GENERATE_SERIES(1, 300) sections,
GENERATE_SERIES(1, 90000) gen
WHERE gen <= sections * 300;

It generated 13545000 records.

Composite index on it:

CREATE INDEX id1_id2_idx
ON public.scale_data
USING btree
(id1, id2);

And select#1:

select id2 from scale_data where id2 in (50)order by id1 desc
limit 500

Explain analyze:

"Limit (cost=0.56..1177.67 rows=500 width=11) (actual
time=0.046..5.124 rows=500 loops=1)"" -> Index Only Scan Backward
using id1_id2_idx on scale_data (cost=0.56..311588.74 rows=132353
width=11) (actual time=0.045..5.060 rows=500 loops=1)"" Index
Cond: (id2 = '50'::numeric)"" Heap Fetches: 0""Planning time:
0.103 ms""Execution time: 5.177 ms"

Select#2 --more values in IN - plan has changed

select id2 from scale_data where id2 in (50, 52)order by id1 desc
limit 500

Explain analyze#2:

"Limit (cost=0.56..857.20 rows=500 width=11) (actual
time=0.061..8.703 rows=500 loops=1)"" -> Index Only Scan Backward
using id1_id2_idx on scale_data (cost=0.56..445780.74 rows=260190
width=11) (actual time=0.059..8.648 rows=500 loops=1)"" Filter:
(id2 = ANY ('{50,52}'::numeric[]))"" Rows Removed by Filter:
25030"" Heap Fetches: 0""Planning time: 0.153 ms""Execution
time: 8.771 ms"

Why plan differs? Why in #1 it does show like *Index condition*, but in #2
*Filter* and number of index scanned cells. Doesn't sql#1 traverse index in
the same way like explain for sql#2 shows?

On real/production DB #2 works much slower, even if search by 2 keys
separately is fast

PG 9.5, CentOS 6.7

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>
Virus-free.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Browse pgsql-performance by date

  From Date Subject
Next Message Ravi Tammineni 2017-05-25 19:48:43 Monitoring tool for Postgres Database
Previous Message Tomas Vondra 2017-05-25 15:38:13 Re: FW: Re: Query is running very slow......