From: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD |
Date: | 2011-05-27 15:13:42 |
Message-ID: | BANLkTinOg-q-r7Zfvsdgs+h7OruK5nAV3Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, May 27, 2011 at 7:41 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 27.05.2011 16:52, Pavan Deolasee wrote:
>>
>> On closer inspection, I realized that we have
>> deliberately put in this hook to ensure that we use visibility maps
>> only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
>> sequential pages to take advantage of possible OS seq scan
>> optimizations.
>
> That, and the fact that if you skip any page, you can't advance
> relfrozenxid.
Hmm. For a significantly large table, wouldn't it be the case that we
would most likely skip one page somewhere ? Would it be better that we
instead do a full scan every once in a while instead of relying on a
not-so-well-understood heuristic ?
>
>> My statistical skills are limited, but wouldn't that mean that for a
>> fairly well distributed write activity across a large table, if there
>> are even 3-4% update/deletes, we would most likely hit a
>> not-all-visible page for every 32 pages scanned ? That would mean that
>> almost entire relation will be scanned even if the visibility map
>> tells us that only 3-4% pages require scanning ? And the probability
>> will increase with the increase in the percentage of updated/deleted
>> tuples. Given that the likelihood of anyone calling VACUUM (manually
>> or through autovac settings) on a table which has less than 3-4%
>> updates/deletes is very low, I am worried that might be loosing all
>> advantages of visibility maps for a fairly common use case.
>
> Well, as with normal queries, it's usually faster to just seqscan the whole
> table if you need to access more than a few percent of the pages, because
> sequential I/O is so much faster than random I/O. The visibility map really
> only helps if all the updates are limited to some part of the table.
The vacuum scan is not a complete random scan. So I am not sure how
effective a complete seq scan be. May be we need to run some tests to
measure that too before we choose one over the other.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2011-05-27 15:19:18 | Re: dblink crash on PPC |
Previous Message | Greg Stark | 2011-05-27 15:10:19 | Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD |