number of rows estimation for bit-AND operation

From: Slava Moudry <smoudry(at)4info(dot)net>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: number of rows estimation for bit-AND operation
Date: 2009-08-17 20:07:18
Message-ID: 622F69662CFE9F4182958973F99F3F1515102D3DD6@EXVMBX017-12.exch017.msoutlookonline.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,
I am using int8 field to pack a number of error flags. This is very common technique for large tables to pack multiple flags in one integer field.

For most records - the mt_flags field is 0. Here is the statistics (taken from pgAdmin Statistics tab for mt_flags column):
Most common Values: {0,128,2,4,8)
Most common Frequencies: {0.96797,0.023,0.0076,0.0005,0.00029)

What I notice that when bit-AND function is used - Postgres significantly underestimates the amount of rows:

explain analyze select count(*) from mt__20090801 where mt_flags&8=0;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=83054.43..83054.44 rows=1 width=0) (actual time=2883.154..2883.154 rows=1 loops=1)
-> Seq Scan on mt__20090801 (cost=0.00..83023.93 rows=12200 width=0) (actual time=0.008..2100.390 rows=2439435 loops=1)
Filter: ((mt_flags & 8) = 0)
Total runtime: 2883.191 ms
(4 rows)

This is not an issue for the particular query above, but I noticed that due to that miscalculation in many cases Postgres chooses plan with Nested Loops for other queries. I can fix it by setting enable_nest_loops to off, but it's not something I should set for all queries.
Is there any way to help Postgres make a better estimation for number of rows returned by bit function?
Thanks,
-Slava Moudry, Senior DW Engineer. 4Info Inc.

P.S. table definition:

\d mt__20090801
Table "dw.mt__20090801"
Column | Type | Modifiers
--------------------------+-----------------------------+-----------
mt_id | bigint | not null
mt_ts | timestamp without time zone |
ad_cost | numeric(10,5) |
short_code | integer |
message_id | bigint | not null
mp_code | character(1) | not null
al_id | integer | not null
cust_id | integer |
device_id | integer | not null
broker_id | smallint |
partner_id | integer |
ad_id | integer |
keyword_id | integer |
sc_id | integer |
cp_id | integer |
src_alertlog_id | bigint |
src_query_id | bigint |
src_response_message_num | smallint |
src_gateway_message_id | bigint |
mt_flags | integer |
message_length | integer | not null
created_etl | timestamp without time zone |
Indexes:
"mt_device_id__20090801" btree (device_id) WITH (fillfactor=100), tablespace "index2"
"mt_ts__20090801" btree (mt_ts) WITH (fillfactor=100) CLUSTER, tablespace "index2"
Check constraints:
"mt__20090801_mt_ts_check" CHECK (mt_ts >= '2009-08-01 00:00:00'::timestamp without time zone AND mt_ts < '2009-08-02 00:00:00'::timestamp without time
zone)
Inherits: mt
Tablespace: "dw_tables3"

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Carey 2009-08-17 23:43:16 Re: Memory reporting on CentOS Linux
Previous Message Jeremy Carroll 2009-08-17 17:24:36 Re: Memory reporting on CentOS Linux