When can postgresql use a partial (NOT NULL) index? Seems to depend on size of IN clause (even with enable seqscan = off)

From: Timothy Garnett <tgarnett(at)panjiva(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: When can postgresql use a partial (NOT NULL) index? Seems to depend on size of IN clause (even with enable seqscan = off)
Date: 2010-08-03 20:03:59
Message-ID: AANLkTinGsTaG6Su1Eyn6KUBfkfPYSvbnfK-kZA1k1xMV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,

I'm debugging a performance issue that looks like it might actually be an
issue/limitation/parameter/bug in the query planner, but since I couldn't
find anything authoritative on when exactly postgresql is able to use
partial not null indexes I'm not sure that that's the case and I was hopping
someone could give some clarity around that or point to an option I could
tweak that would change this behavior. Anyways the table in question (with
names changed) is below. I'm running postgres 8.4.1

\d
Column | Type | Modifiers
--------------------------+----------+-----------
id | integer | not null
sid | integer |
bid | integer |
m | date | not null
k | integer |
cc | text |
f | integer |
d | smallint |
u | smallint |
f2 | integer |
cm | text |
Indexes:
"scm_pkey" PRIMARY KEY, btree (id) WITH (fillfactor=100)
"index_scm_on_bid" btree (bid) WITH (fillfactor=100) WHERE bid IS NOT
NULL

~35 million rows (about 15 million of which have null bid). There are about
1 million distinct bids (with selectivity ranging from 1 to ~100,000 rows).

The end user selects an arbitrary number of bid's then we run several
queries one of which I include explain analyze output below. For <= 100
bids the planner uses the index and completes in ~35ms, for 101+ bid's the
planner uses a sequence scan and completes in ~45 seconds (3 orders of
magnitude slower). My first thought was that there was a problem with the
statistics/estimation in the planner, but using "set enable seq_scan=off;"
still does not use the index when there's over 100 bid's in the IN clause.
Breaking the IN clause into 2 < 100 element groups does however rescue the
use of the index and the fast performance as does creating a new non-partial
index on bid (i.e. an index "index_scm_on_bid2" btree (bid) WITH
(fillfactor=100) will be used with over 100 bid's). My best guess is that
this is do to some limit on the number of in clause elements the query
planner will check to see if match a partial index before giving up and
assuming it doesn't (if there is such a limit I'd definitely like to make it
larger, at least for this big of a table...). Is this 100 limit documented
anywhere?

Anyways my workarounds are to either split the IN clause into multiple < 100
element groups or switch the index to be across all values rather then not
null. The reason I tend to use not null partial indexes is that I have
another similarly sized table with 20 separately indexed columns each of
which only has about 10000 non-null rows and the disk space savings from the
not null partial index there are huge.

# <= 100 bids
=>explain analyze SELECT * FROM "scm" WHERE ((bid in
(1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044)))
order by m desc limit 100;
Limit (cost=66518.18..66518.43 rows=100 width=229) (actual
time=24.665..24.821 rows=100 loops=1)
-> Sort (cost=66518.18..66563.14 rows=17987 width=229) (actual
time=24.658..24.698 rows=100 loops=1)
Sort Key: m
Sort Method: top-N heapsort Memory: 85kB
-> Bitmap Heap Scan on scm (cost=566.83..65830.73 rows=17987
width=229) (actual time=1.863..12.850 rows=16430 loops=1)
Recheck Cond: (bid = ANY
('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044}'::integer[]))
-> Bitmap Index Scan on index_scm_on_bid (cost=0.00..562.33
rows=17987 width=0) (actual time=1.719..1.719 rows=16430 loops=1)
Index Cond: (bid = ANY
('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044}'::integer[]))
Total runtime: 24.908 ms

# > 100 bids
# NOTE - this is the same even after running
# set enable_seqscan = off;
# HOWEVER - this will use an index scan and run fast if we create a non
partial index on bid
=>explain analyze SELECT * FROM "scm" WHERE ((bid in
(1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044,1208045)))
order by m desc limit 100;
Limit (cost=3783824.03..3783824.28 rows=100 width=229) (actual
time=31229.140..31229.284 rows=100 loops=1)
-> Sort (cost=3783824.03..3783869.06 rows=18014 width=229) (actual
time=31229.137..31229.193 rows=100 loops=1)
Sort Key: m
Sort Method: top-N heapsort Memory: 82kB
-> Seq Scan on scm (cost=0.00..3783135.55 rows=18014 width=229)
(actual time=97.582..31217.270 rows=16433 loops=1)
Filter: (bid = ANY
('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044,1208045}'::integer[]))
Total runtime: 31229.377 ms

# Splitting the IN into <= 100 element pieces works
=>explain analyze SELECT * FROM "scm" WHERE ((bid in
(1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044)))
OR ((bid IN (1208045))) order by m desc limit 100;
Limit (cost=66664.94..66665.19 rows=100 width=229) (actual
time=24.749..24.902 rows=100 loops=1)
-> Sort (cost=66664.94..66709.98 rows=18014 width=229) (actual
time=24.747..24.795 rows=100 loops=1)
Sort Key: m
Sort Method: top-N heapsort Memory: 85kB
-> Bitmap Heap Scan on scm (cost=577.99..65976.46 rows=18014
width=229) (actual time=1.867..12.888 rows=16433 loops=1)
Recheck Cond: ((bid = ANY
('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044}'::integer[]))
OR (bid = 1208045))
-> BitmapOr (cost=577.99..577.99 rows=18014 width=0)
(actual time=1.720..1.720 rows=0 loops=1)
-> Bitmap Index Scan on index_scm_on_bid
(cost=0.00..562.33 rows=17987 width=0) (actual time=1.715..1.715 rows=16430
loops=1)
Index Cond: (bid = ANY
('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044}'::integer[]))
-> Bitmap Index Scan on index_scm_on_bid
(cost=0.00..6.66 rows=27 width=0) (actual time=0.003..0.003 rows=3 loops=1)
Index Cond: (bid = 1208045)
Total runtime: 24.998 ms

Thanks!
Tim

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2010-08-03 20:17:02 Re: optimal memory
Previous Message David Kerr 2010-08-03 20:02:16 Re: Question about Idle in TX