Quick Links

Re: autovac issue with large number of tables

From:	Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kasahara Tatsuhito <kasahara(dot)tatsuhito(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <nasbyj(at)amazon(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: autovac issue with large number of tables
Date:	2020-12-01 07:32:20
Message-ID:	299fb989-2764-b874-e1d3-7626fc6d6a11@oss.nttdata.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2020/12/01 16:23, Masahiko Sawada wrote:
> On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
> <kasahara(dot)tatsuhito(at)gmail(dot)com> wrote:
>>
>> Hi,
>>
>> On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>>
>>>
>>>
>>> On 2020/11/30 10:43, Masahiko Sawada wrote:
>>>> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
>>>> <kasahara(dot)tatsuhito(at)gmail(dot)com> wrote:
>>>>>
>>>>> Hi, Thanks for you comments.
>>>>>
>>>>> On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2020/11/27 18:38, Kasahara Tatsuhito wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote:
>>>>>>>>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>>>>>>>>
>>>>>>>>>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
>>>>>>>>>> <kasahara(dot)tatsuhito(at)gmail(dot)com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
>>>>>>>>>>>> <kasahara(dot)tatsuhito(at)gmail(dot)com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
>>>>>>>>>>>>> <kasahara(dot)tatsuhito(at)gmail(dot)com> wrote:
>>>>>>>>>>>>>>> I wonder if we could have table_recheck_autovac do two probes of the stats
>>>>>>>>>>>>>>> data. First probe the existing stats data, and if it shows the table to
>>>>>>>>>>>>>>> be already vacuumed, return immediately. If not, *then* force a stats
>>>>>>>>>>>>>>> re-read, and check a second time.
>>>>>>>>>>>>>> Does the above mean that the second and subsequent table_recheck_autovac()
>>>>>>>>>>>>>> will be improved to first check using the previous refreshed statistics?
>>>>>>>>>>>>>> I think that certainly works.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If that's correct, I'll try to create a patch for the PoC
>>>>>>>>>>>>>
>>>>>>>>>>>>> I still don't know how to reproduce Jim's troubles, but I was able to reproduce
>>>>>>>>>>>>> what was probably a very similar problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This problem seems to be more likely to occur in cases where you have
>>>>>>>>>>>>> a large number of tables,
>>>>>>>>>>>>> i.e., a large amount of stats, and many small tables need VACUUM at
>>>>>>>>>>>>> the same time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So I followed Tom's advice and created a patch for the PoC.
>>>>>>>>>>>>> This patch will enable a flag in the table_recheck_autovac function to use
>>>>>>>>>>>>> the existing stats next time if VACUUM (or ANALYZE) has already been done
>>>>>>>>>>>>> by another worker on the check after the stats have been updated.
>>>>>>>>>>>>> If the tables continue to require VACUUM after the refresh, then a refresh
>>>>>>>>>>>>> will be required instead of using the existing statistics.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I did simple test with HEAD and HEAD + this PoC patch.
>>>>>>>>>>>>> The tests were conducted in two cases.
>>>>>>>>>>>>> (I changed few configurations. see attached scripts)
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. Normal VACUUM case
>>>>>>>>>>>>> - SET autovacuum = off
>>>>>>>>>>>>> - CREATE tables with 100 rows
>>>>>>>>>>>>> - DELETE 90 rows for each tables
>>>>>>>>>>>>> - SET autovacuum = on and restart PostgreSQL
>>>>>>>>>>>>> - Measure the time it takes for all tables to be VACUUMed
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. Anti wrap round VACUUM case
>>>>>>>>>>>>> - CREATE brank tables
>>>>>>>>>>>>> - SELECT all of these tables (for generate stats)
>>>>>>>>>>>>> - SET autovacuum_freeze_max_age to low values and restart PostgreSQL
>>>>>>>>>>>>> - Consumes a lot of XIDs by using txid_curent()
>>>>>>>>>>>>> - Measure the time it takes for all tables to be VACUUMed
>>>>>>>>>>>>>
>>>>>>>>>>>>> For each test case, the following results were obtained by changing
>>>>>>>>>>>>> autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
>>>>>>>>>>>>> Also changing num of tables to 1000, 5000, 10000 and 20000.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
>>>>>>>>>>>>> but I think it's enough to ask for a trend.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ===========================================================================
>>>>>>>>>>>>> [1.Normal VACUUM case]
>>>>>>>>>>>>> tables:1000
>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec
>>>>>>>>>>>>>
>>>>>>>>>>>>> tables:5000
>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec
>>>>>>>>>>>>>
>>>>>>>>>>>>> tables:10000
>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec
>>>>>>>>>>>>>
>>>>>>>>>>>>> tables:20000
>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec
>>>>>>>>>>>>>
>>>>>>>>>>>>> [2.Anti wrap round VACUUM case]
>>>>>>>>>>>>> tables:1000
>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec
>>>>>>>>>>>>>
>>>>>>>>>>>>> tables:5000
>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec
>>>>>>>>>>>>>
>>>>>>>>>>>>> tables:10000
>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec
>>>>>>>>>>>>>
>>>>>>>>>>>>> tables:20000
>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
>>>>>>>>>>>>> ===========================================================================
>>>>>>>>>>>>>
>>>>>>>>>>>>> The cases without patch, the scalability of the worker has decreased
>>>>>>>>>>>>> as the number of tables has increased.
>>>>>>>>>>>>> In fact, the more workers there are, the longer it takes to complete
>>>>>>>>>>>>> VACUUM to all tables.
>>>>>>>>>>>>> The cases with patch, it shows good scalability with respect to the
>>>>>>>>>>>>> number of workers.
>>>>>>>>>>>>
>>>>>>>>>>>> It seems a good performance improvement even without the patch of
>>>>>>>>>>>> shared memory based stats collector.
>>>>>>>>
>>>>>>>> Sounds great!
>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Note that perf top results showed that hash_search_with_hash_value,
>>>>>>>>>>>>> hash_seq_search and
>>>>>>>>>>>>> pgstat_read_statsfiles are dominant during VACUUM in all patterns,
>>>>>>>>>>>>> with or without the patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Therefore, there is still a need to find ways to optimize the reading
>>>>>>>>>>>>> of large amounts of stats.
>>>>>>>>>>>>> However, this patch is effective in its own right, and since there are
>>>>>>>>>>>>> only a few parts to modify,
>>>>>>>>>>>>> I think it should be able to be applied to current (preferably
>>>>>>>>>>>>> pre-v13) PostgreSQL.
>>>>>>>>>>>>
>>>>>>>>>>>> +1
>>>>>>>>>>>>
>>>>>>>>>>>> +
>>>>>>>>>>>> + /* We might be better to refresh stats */
>>>>>>>>>>>> + use_existing_stats = false;
>>>>>>>>>>>> }
>>>>>>>>>>>> + else
>>>>>>>>>>>> + {
>>>>>>>>>>>>
>>>>>>>>>>>> - heap_freetuple(classTup);
>>>>>>>>>>>> + heap_freetuple(classTup);
>>>>>>>>>>>> + /* The relid has already vacuumed, so we might be better to
>>>>>>>>>>>> use exiting stats */
>>>>>>>>>>>> + use_existing_stats = true;
>>>>>>>>>>>> + }
>>>>>>>>>>>>
>>>>>>>>>>>> With that patch, the autovacuum process refreshes the stats in the
>>>>>>>>>>>> next check if it finds out that the table still needs to be vacuumed.
>>>>>>>>>>>> But I guess it's not necessarily true because the next table might be
>>>>>>>>>>>> vacuumed already. So I think we might want to always use the existing
>>>>>>>>>>>> for the first check. What do you think?
>>>>>>>>>>> Thanks for your comment.
>>>>>>>>>>>
>>>>>>>>>>> If we assume the case where some workers vacuum on large tables
>>>>>>>>>>> and a single worker vacuum on small tables, the processing
>>>>>>>>>>> performance of the single worker will be slightly lower if the
>>>>>>>>>>> existing statistics are checked every time.
>>>>>>>>>>>
>>>>>>>>>>> In fact, at first I tried to check the existing stats every time,
>>>>>>>>>>> but the performance was slightly worse in cases with a small number of workers.
>>>>>>>>
>>>>>>>> Do you have this benchmark result?
>>>>>>>>
>>>>>>>>
>>>>>>>>>>> (Checking the existing stats is lightweight , but at high frequency,
>>>>>>>>>>> it affects processing performance.)
>>>>>>>>>>> Therefore, at after refresh statistics, determine whether autovac
>>>>>>>>>>> should use the existing statistics.
>>>>>>>>>>
>>>>>>>>>> Yeah, since the test you used uses a lot of small tables, if there are
>>>>>>>>>> a few workers, checking the existing stats is unlikely to return true
>>>>>>>>>> (no need to vacuum). So the cost of existing stats check ends up being
>>>>>>>>>> overhead. Not sure how slow always checking the existing stats was,
>>>>>>>>>> but given that the shared memory based stats collector patch could
>>>>>>>>>> improve the performance of refreshing stats, it might be better not to
>>>>>>>>>> check the existing stats frequently like the patch does. Anyway, I
>>>>>>>>>> think it’s better to evaluate the performance improvement with other
>>>>>>>>>> cases too.
>>>>>>>>> Yeah, I would like to see how much the performance changes in other cases.
>>>>>>>>> In addition, if the shared-based-stats patch is applied, we won't need to reload
>>>>>>>>> a huge stats file, so we will just have to check the stats on
>>>>>>>>> shared-mem every time.
>>>>>>>>> Perhaps the logic of table_recheck_autovac could be simpler.
>>>>>>>>>
>>>>>>>>>>> BTW, I found some typos in comments, so attache a fixed version.
>>>>>>>>
>>>>>>>> The patch adds some duplicated codes into table_recheck_autovac().
>>>>>>>> It's better to make the common function performing them and make
>>>>>>>> table_recheck_autovac() call that common function, to simplify the code.
>>>>>>> Thanks for your comment.
>>>>>>> Hmm.. I've cut out the duplicate part.
>>>>>>> Attach the patch.
>>>>>>> Could you confirm that it fits your expecting?
>>>>>>
>>>>>> Yes, thanks for updataing the patch! Here are another review comments.
>>>>>>
>>>>>> + shared = pgstat_fetch_stat_dbentry(InvalidOid);
>>>>>> + dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
>>>>>>
>>>>>> When using the existing stats, ISTM that these are not necessary and
>>>>>> we can reuse "shared" and "dbentry" obtained before. Right?
>>>>> Yeah, but unless autovac_refresh_stats() is called, these functions
>>>>> read the information from the
>>>>> local hash table without re-read the stats file, so the process is very light.
>>>>> Therefore, I think, it is better to keep the current logic to keep the
>>>>> code simple.
>>>>>
>>>>>>
>>>>>> + /* We might be better to refresh stats */
>>>>>> + use_existing_stats = false;
>>>>>>
>>>>>> I think that we should add more comments about why it's better to
>>>>>> refresh the stats in this case.
>>>>>>
>>>>>> + /* The relid has already vacuumed, so we might be better to use existing stats */
>>>>>> + use_existing_stats = true;
>>>>>>
>>>>>> I think that we should add more comments about why it's better to
>>>>>> reuse the stats in this case.
>>>>> I added comments.
>>>>>
>>>>> Attache the patch.
>>>>>
>>>>
>>>> Thank you for updating the patch. Here are some small comments on the
>>>> latest (v4) patch.
>>>>
>>>> + * So if the last time we checked a table that was already vacuumed after
>>>> + * refres stats, check the current statistics before refreshing it.
>>>> + */
>>>>
>>>> s/refres/refresh/
>> Thanks! fixed.
>> Attached the patch.
>>
>>>>
>>>> -----
>>>> +/* Counter to determine if statistics should be refreshed */
>>>> +static bool use_existing_stats = false;
>>>> +
>>>>
>>>> I think 'use_existing_stats' can be declared within table_recheck_autovac().
>>>>
>>>> -----
>>>> While testing the performance, I realized that the statistics are
>>>> reset every time vacuumed one table, leading to re-reading the stats
>>>> file even if 'use_existing_stats' is true. Please refer that vacuum()
>>>> eventually calls AtEOXact_PgStat() which calls to
>>>> pgstat_clear_snapshot().
>>>
>>> Good catch!
>>>
>>>
>>>> I believe that's why the performance of the
>>>> method of always checking the existing stats wasn’t good as expected.
>>>> So if we save the statistics somewhere and use it for rechecking, the
>>>> results of the performance benchmark will differ between these two
>>>> methods.
>> Thanks for you checks.
>> But, if a worker did vacuum(), that means this worker had determined
>> need vacuum in the
>> table_recheck_autovac(). So, use_existing_stats set to false, and next
>> time, refresh stats.
>> Therefore I think the current patch is fine, as we want to avoid
>> unnecessary refreshing of
>> statistics before the actual vacuum(), right?
>
> Yes, you're right.
>
> When I benchmarked the performance of the method of always checking
> existing stats I edited your patch so that it sets use_existing_stats
> = true even if the second check is false (i.g., vacuum is needed).
> And the result I got was worse than expected especially in the case of
> a few autovacuum workers. But it doesn't evaluate the performance of
> that method rightly as the stats snapshot is cleared every time
> vacuum. Given you had similar results, I guess you used a similar way
> when evaluating it, is it right? If so, it’s better to fix this issue
> and see how the performance benchmark results will differ.
>
> For example, the results of the test case with 10000 tables and 1
> autovacuum worker I reported before was:
>
> 10000 tables:
> autovac_workers 1 : 158s,157s, 290s
>
> But after fixing that issue in the third method (always checking the
> existing stats), the results are:

Could you tell me how you fixed that issue? You copied the stats to
somewhere as you suggested or skipped pgstat_clear_snapshot() as
I suggested?

Kasahara-san seems not to like the latter idea because it might
cause bad side effect. So we should use the former idea?

>
> 10000 tables:
> autovac_workers 1 : 157s,157s, 160s

Looks good number!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Re: autovac issue with large number of tables at 2020-12-01 07:23:36 from Masahiko Sawada

Responses

Re: autovac issue with large number of tables at 2020-12-01 08:31:00 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2020-12-01 07:35:49	Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour
Previous Message	Amit Kapila	2020-12-01 07:27:55	Re: [HACKERS] logical decoding of two-phase transactions