Re: Why does query planner choose slower BitmapAnd ?

From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Seamus Abshere <seamus(at)abshere(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: Why does query planner choose slower BitmapAnd ?
Date: 2016-02-22 20:54:04
Message-ID: 764C2D4F-EBE9-489A-8C42-8639758964CF@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


> On 22 Feb 2016, at 16:58, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> (BTW, is that index really on just a boolean column? It seems
> unlikely that "phoneable" would be a sufficiently selective
> condition to justify having an index on it. I'd seriously consider
> dropping that index as another solution approach.)

On that train of thought, I would think that a person or company would only be phoneable if they have a phone number registered somewhere. That somewhere probably being in another table that's too far away from the current table to check it straight away - so this is an optimisation, right?

Where I see that going is as follows: A "contact" either has a phone number - in which case you'd probably rather get that phone number - or they don't, in which case a null value is often sufficient[1].
While a phone number certainly takes up more storage than a boolean, it wouldn't require an index (because it's available right there) nor the extra joins to look up the actual phone number. And if you'd still want to put an index on it, the null values won't be indexed, which takes a bit off the burden of the larger field size.

You _could_ also take a shortcut and use a variation of your current approach by storing null instead of false for phoneable, but then your index would contain nothing but true values which rather defeats the point of having an index.

Query-wise, I suspect that the number of "contacts" that have a phone number far outweighs the number that doesn't, in which case it's more efficient to query for those that don't have one (fewer index hits) and eliminate those from the results than the other way around. In my experience, both the NOT EXISTS and the LEFT JOIN + WHERE phoneable IS NULL tend to perform better.

A final variation on the above would be to have a conditional index on your PK for those "contacts" that are NOT phoneable. That's probably the shortest and quickest list to query. I'd still prefer that field to contain something a bit more meaningful though...

Well, enough of my rambling!

Ad 1. It is possible that you cater for the possibility that you don't know whether a "contact" has a phone number or not, in which case null would probably be the wrong choice for "no phone number" because then you wouldn't be able to distinguish between "no phone number" and "I don't know".

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Nicklas Avén 2016-02-22 21:30:04 ERROR: cannot convert relation containing dropped columns to view
Previous Message Chris Mair 2016-02-22 20:39:31 Re: decoding BLOB's