From: | Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Heavily modified big table bloat even in auto vacuum is running |
Date: | 2013-12-12 06:54:51 |
Message-ID: | 8977CB36860C5843884E0A18D8747B0372BF34D5@szxeml558-mbs.china.huawei.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 06 December 2013 11:57 Amit Kapila wrote:
> On Fri, Nov 29, 2013 at 6:55 PM, Haribabu kommi
> <haribabu(dot)kommi(at)huawei(dot)com> wrote:
> > On 29 November 2013 12:00 Amit Kapila wrote:
> >> On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi
> >> <haribabu(dot)kommi(at)huawei(dot)com> wrote:
> >> Few questions about your latest patch:
> >> a. Is there any reason why you are doing estimation of dead tuples
> >> only for Autovacuum and not for Vacuum.
> >
> > No, changed.
> >
> >> /* clear and get the new stats for calculating proper dead tuples */
> >> pgstat_clear_snapshot(); tabentry =
> >> pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
> >> b. In the above code, to get latest data you are first clearing
> >> snapshot and then calling pgstat function. It will inturn perform
> I/O
> >> (read of stats file) and send/receive message from stats
> >> collector to ensure it can read latest data. I think it will add
> overhead
> >> to Vacuum, especially if 'nkeep' calculated in function
> >> lazy_scan_heap() can serve the purpose. In my simple test[1], I
> >> observed
> >> that value of keep can serve the purpose.
> >>
> >> Can you please once try the test on 'nkeep' approach patch.
> >
> > Using the nkeep and snapshot approach, I ran the test for 40 mins
> with
> > a high analyze_threshold and results are below.
> >
> > Auto vacuum count Bloat size
> > Master 11 220MB
> > Patched_nkeep 14 215MB
> > Patched_snapshot 18 198MB
> >
> > Both the approaches are showing good improvement in the test.
> > Updated patches, test script and configuration is attached in the
> mail.
>
> I think updating dead tuple count using nkeep is good idea as similar
> thing is done for Analyze as well in acquire_sample_rows().
> One minor point, I think it is better to log dead tuples is below error
> message:
> ereport(LOG,
> (errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
> "pages: %d removed, %d remain\n"
> "tuples: %.0f removed, %.0f remain\n"
>
> "tuples: %.0f removed, %.0f remain, %.0f dead\n"
>
>
> About your test, how to collect the data by running this script, are
> you manually stopping it after 40 mins, because I ran it for more than
> an hour, the final result didn't came.
> As I mentioned you last time, please simplify your test, for other
> person in its current form, it is difficult to make meaning out of it.
> Write comments on top of it in steps form to explain what exactly it is
> doing and how to take data using it (for example, do I need to wait,
> till script ends; how long this test can take to complete).
A simplified test and updated patch by taking care the above comment are attached in the mail.
I am not able to reduce the test duration but changed as the test automatically exists after 45 mins run.
Please check vacuum_test.sh file more details for running the test.
Auto vacuum count Bloat size
Master 15 220MB
Patched_nkeep 18 213MB
Please let me know your suggestions.
Regards,
Hari babu.
Attachment | Content-Type | Size |
---|---|---|
vacuum_fix_v7_nkeep.patch | application/octet-stream | 5.0 KB |
test_script.tar.gz | application/x-gzip | 8.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2013-12-12 07:02:41 | Re: pgbench with large scale factor |
Previous Message | knizhnik | 2013-12-12 06:53:00 | Re: In-Memory Columnar Store |