From: | Tom Dearman <tom(dot)dearman(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Michael Lewis <mlewis(at)entrata(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-general General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Frequetly updated partial index leads to bloat on index for Postresql 11 |
Date: | 2021-07-16 16:19:24 |
Message-ID: | 8563E084-A6B1-41E6-BCAB-B7D31E75C981@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Other indexes do bloat, but the percentage bloat is a lot less, presumably because this is a partial index where the partial column has a high degree of changes ie maybe 100 genuinely ‘live’ rows in a table of 300 million where every row has gone through a state where it would have been in the index. In some of our partitions we might have 2000 old rows that do hang around for a long time and another 100 or so ‘real’ partial index entries so 2200 in total but the number of rows would be 300 million so it is a lot less than 1%.
> On 16 Jul 2021, at 16:43, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Tom Dearman <tom(dot)dearman(at)gmail(dot)com> writes:
>> We have change autovacuum so that it runs more frequently autovacuum_vacuum_scale_factor=0.035, the reason we have a partial index on the status is that in a table of 300 million entries, only about 100 or so would have status=‘IN_PROGRESS’ so we think this should be a nice small index and many of our queries want to look up with a where clause status=‘IN_PROGRESS’. In theory it works well, but we get a lot of index bloat as there is a lot of churn on the status value, ie each row starts as IN_PROGRESS and then goes to one of 4 possible completed statuses.
>
> Is it really the case that only this index is bloating? In principle, an
> update on a row of the table should result in new entries in every index
> of the table. A partial index, due to the filter applied to possibly not
> store any index entry, should in theory have less bloat than other
> indexes.
>
> If that's not what you're seeing, there must be something about the data
> being stored in that index (not the partial-index filter condition) that
> results in a lot of low-occupancy index pages over time. You didn't say
> anything about what the data payload is. But we've seen bloat problems in
> indexes where, say, every tenth or hundredth value in the index ordering
> would persist for a long time while the ones in between get deleted
> quickly. That leads to low-density indexes that VACUUM can't do anything
> about.
>
> regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2021-07-16 17:00:23 | Re: dealing with dependencies |
Previous Message | Francisco Olarte | 2021-07-16 16:14:21 | Re: Frequetly updated partial index leads to bloat on index for Postresql 11 |