From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD |
Date: | 2011-06-03 19:16:29 |
Message-ID: | 201106031916.p53JGTC27199@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Heikki Linnakangas wrote:
> On 27.05.2011 16:52, Pavan Deolasee wrote:
> > On closer inspection, I realized that we have
> > deliberately put in this hook to ensure that we use visibility maps
> > only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
> > sequential pages to take advantage of possible OS seq scan
> > optimizations.
>
> That, and the fact that if you skip any page, you can't advance
> relfrozenxid.
>
> > My statistical skills are limited, but wouldn't that mean that for a
> > fairly well distributed write activity across a large table, if there
> > are even 3-4% update/deletes, we would most likely hit a
> > not-all-visible page for every 32 pages scanned ? That would mean that
> > almost entire relation will be scanned even if the visibility map
> > tells us that only 3-4% pages require scanning ? And the probability
> > will increase with the increase in the percentage of updated/deleted
> > tuples. Given that the likelihood of anyone calling VACUUM (manually
> > or through autovac settings) on a table which has less than 3-4%
> > updates/deletes is very low, I am worried that might be loosing all
> > advantages of visibility maps for a fairly common use case.
>
> Well, as with normal queries, it's usually faster to just seqscan the
> whole table if you need to access more than a few percent of the pages,
> because sequential I/O is so much faster than random I/O. The visibility
> map really only helps if all the updates are limited to some part of the
> table. For example, if you only recent records are updated frequently,
> and old ones are almost never touched.
I realize we just read the pages from the kernel to maintain sequential
I/O, but do we actually read the contents of the page if we know it
doesn't need vacuuming? If so, do we need to?
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2011-06-03 19:26:13 | Re: [HACKERS] DOCS: SGML identifier may not exceed 44 characters |
Previous Message | Kevin Grittner | 2011-06-03 19:11:21 | Re: SIREAD lock versus ACCESS EXCLUSIVE lock |