Re: 10.1: hash index size exploding on vacuum full analyze

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: AP <pgsql(at)inml(dot)weebeastie(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: 10.1: hash index size exploding on vacuum full analyze
Date: 2017-11-17 02:28:13
Message-ID: CAA4eK1K2ynh=_gDHYydVXHvG+Lk=xyY-Pb9n86QjbtY768jm-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Nov 16, 2017 at 10:00 AM, AP <pgsql(at)inml(dot)weebeastie(dot)net> wrote:
> On Thu, Nov 16, 2017 at 09:48:13AM +0530, Amit Kapila wrote:
>> On Thu, Nov 16, 2017 at 4:59 AM, AP <pgsql(at)inml(dot)weebeastie(dot)net> wrote:
>> > I've some tables that'll never grow so I decided to replace a big index
>> > with one with a fillfactor of 100. That went well. The index shrunk to
>> > 280GB. I then did a vacuum full analyze on the table to get rid of any
>> > cruft (as the table will be static for a long time and then only deletes
>> > will happen) and the index exploded to 701GB. When it was created with
>> > fillfactor 90 (organically by filling the table) the index was 309GB.
>>
>> Sounds quite strange. I think during vacuum it leads to more number
>> of splits than when the original data was loaded. By any chance do
>> you have a copy of both the indexes (before vacuum full and after
>> vacuum full)? Can you once check and share the output of
>> pgstattuple-->pgstathashindex() and pageinspect->hash_metapage_info()?
>> I wanted to confirm if the bloat is due to additional splits.
>
> I'll see what I can do. Currently vacuuming the table without the index
> so that I can then do a create index concurrently and get back my 280GB
> index (it's how I got it in the first place). Namely:
>

One possible theory could be that the calculation for initial buckets
required for the index has overestimated the number of buckets. I
think this is possible because we choose the initial number of buckets
based on the number of tuples, but actually while inserting the values
we might have created more of overflow buckets rather than using the
newly created primary buckets. The chances of such a misestimation
are more when there are duplicate values. Now, if that is true, then
actually one should see the same size of the index (as you have seen
after vacuum full ..) when you create an index on the table with the
same values in index columns.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Langote 2017-11-17 02:41:13 Re: BUG #14915: Create sub-partitioning using GENERATED ALWAYS AS IDENTITY will lead to system collapse.
Previous Message Michael Paquier 2017-11-17 02:05:03 Re: BUG #14915: Create sub-partitioning using GENERATED ALWAYS AS IDENTITY will lead to system collapse.