From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Hash index initial size is too large given NULLs or partial indexes |
Date: | 2019-03-08 18:27:38 |
Message-ID: | b0c1be5e-94ce-0496-fe96-b61be7a3067c@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 3/8/19 7:14 PM, Jeff Janes wrote:
> Referring to this thread:
>
> https://dba.stackexchange.com/questions/231647/why-are-partial-postgresql-hash-indices-not-smaller-than-full-indices
>
> When a hash index is created on a populated table, it estimates the
> number of buckets to start out with based on the number of tuples
> returned by estimate_rel_size. But this number ignores both the fact
> that NULLs are not stored in hash indexes, and that partial indexes
> exist. This can lead to much too large hash indexes. Doing a re-index
> just repeats the logic, so doesn't fix anything. Fill factor also can't
> fix it, as you are not allowed to increase that beyond 100.
>
Hmmm :-(
> This goes back to when the pre-sizing was implemented in 2008
> (c9a1cc694abef737548a2a). It seems to be an oversight, rather than
> something that was considered.
>
> Is this a bug that should be fixed? Or if getting a more accurate
> estimate is not possible or not worthwhile, add a code comment about that?
>
I'd agree this smells like a bug (or perhaps two). The sizing probably
should consider both null_frac and selectivity of the index predicate.
When those two are redundant (i.e. when there's IS NOT NULL condition on
indexed column), this will result in under-estimate. That means the
index build will do a an extra split, but that's probably better than
having permanently bloated index.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2019-03-08 18:34:17 | Re: PostgreSQL vs SQL/XML Standards |
Previous Message | Jeff Janes | 2019-03-08 18:14:11 | Hash index initial size is too large given NULLs or partial indexes |