Re: Frequetly updated partial index leads to bloat on index for Postresql 11

From: Tom Dearman <tom(dot)dearman(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Lewis <mlewis(at)entrata(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Frequetly updated partial index leads to bloat on index for Postresql 11
Date: 2021-07-16 16:19:24
Message-ID: 8563E084-A6B1-41E6-BCAB-B7D31E75C981@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Other indexes do bloat, but the percentage bloat is a lot less, presumably because this is a partial index where the partial column has a high degree of changes ie maybe 100 genuinely ‘live’ rows in a table of 300 million where every row has gone through a state where it would have been in the index. In some of our partitions we might have 2000 old rows that do hang around for a long time and another 100 or so ‘real’ partial index entries so 2200 in total but the number of rows would be 300 million so it is a lot less than 1%.

> On 16 Jul 2021, at 16:43, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Tom Dearman <tom(dot)dearman(at)gmail(dot)com> writes:
>> We have change autovacuum so that it runs more frequently autovacuum_vacuum_scale_factor=0.035, the reason we have a partial index on the status is that in a table of 300 million entries, only about 100 or so would have status=‘IN_PROGRESS’ so we think this should be a nice small index and many of our queries want to look up with a where clause status=‘IN_PROGRESS’. In theory it works well, but we get a lot of index bloat as there is a lot of churn on the status value, ie each row starts as IN_PROGRESS and then goes to one of 4 possible completed statuses.
>
> Is it really the case that only this index is bloating? In principle, an
> update on a row of the table should result in new entries in every index
> of the table. A partial index, due to the filter applied to possibly not
> store any index entry, should in theory have less bloat than other
> indexes.
>
> If that's not what you're seeing, there must be something about the data
> being stored in that index (not the partial-index filter condition) that
> results in a lot of low-occupancy index pages over time. You didn't say
> anything about what the data payload is. But we've seen bloat problems in
> indexes where, say, every tenth or hundredth value in the index ordering
> would persist for a long time while the ones in between get deleted
> quickly. That leads to low-density indexes that VACUUM can't do anything
> about.
>
> regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Laurenz Albe 2021-07-16 17:00:23 Re: dealing with dependencies
Previous Message Francisco Olarte 2021-07-16 16:14:21 Re: Frequetly updated partial index leads to bloat on index for Postresql 11