Re: 10.1: hash index size exploding on vacuum full analyze

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: AP <pgsql(at)inml(dot)weebeastie(dot)net>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: 10.1: hash index size exploding on vacuum full analyze
Date: 2017-11-17 06:28:57
Message-ID: CAE9k0PnrF-zOeEZvDLrZb_o3CrA84qw956ZxXZd2464P+86kmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Nov 17, 2017 at 7:58 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Nov 16, 2017 at 10:00 AM, AP <pgsql(at)inml(dot)weebeastie(dot)net> wrote:
>> On Thu, Nov 16, 2017 at 09:48:13AM +0530, Amit Kapila wrote:
>>> On Thu, Nov 16, 2017 at 4:59 AM, AP <pgsql(at)inml(dot)weebeastie(dot)net> wrote:
>>> > I've some tables that'll never grow so I decided to replace a big index
>>> > with one with a fillfactor of 100. That went well. The index shrunk to
>>> > 280GB. I then did a vacuum full analyze on the table to get rid of any
>>> > cruft (as the table will be static for a long time and then only deletes
>>> > will happen) and the index exploded to 701GB. When it was created with
>>> > fillfactor 90 (organically by filling the table) the index was 309GB.
>>>
>>> Sounds quite strange. I think during vacuum it leads to more number
>>> of splits than when the original data was loaded. By any chance do
>>> you have a copy of both the indexes (before vacuum full and after
>>> vacuum full)? Can you once check and share the output of
>>> pgstattuple-->pgstathashindex() and pageinspect->hash_metapage_info()?
>>> I wanted to confirm if the bloat is due to additional splits.
>>
>> I'll see what I can do. Currently vacuuming the table without the index
>> so that I can then do a create index concurrently and get back my 280GB
>> index (it's how I got it in the first place). Namely:
>>
>
> One possible theory could be that the calculation for initial buckets
> required for the index has overestimated the number of buckets. I
> think this is possible because we choose the initial number of buckets
> based on the number of tuples, but actually while inserting the values
> we might have created more of overflow buckets rather than using the
> newly created primary buckets. The chances of such a misestimation
> are more when there are duplicate values. Now, if that is true, then
> actually one should see the same size of the index (as you have seen
> after vacuum full ..) when you create an index on the table with the
> same values in index columns.
>

Amit, I think what you are trying to put here is that the estimation
on number of hash buckets required is calculated based on the number
of tuples in the base table but during this calculation we are not
aware of the fact that the table contains more of the duplicate values
or not. If it contains more of a duplicate values then during index
insertion it would start adding overflow page and many of the hash
index buckets added at start i.e. during hash index size estimation
would remain unused. If this is true then i think hash index would not
be the right choice. However, this is might not be exactly related to
what AP has reported here.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Kapila 2017-11-17 07:51:07 Re: 10.1: hash index size exploding on vacuum full analyze
Previous Message Ashutosh Sharma 2017-11-17 06:20:57 Re: 10.1: hash index size exploding on vacuum full analyze