Re: Vacuum: allow usage of more than 1GB of work mem

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum: allow usage of more than 1GB of work mem
Date: 2016-09-14 15:17:48
Message-ID: CA+Tgmobq1gO2TWygYf9mLVw_5ZKy8OGuxnAxT6efG_tS0xn4Vg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 14, 2016 at 8:16 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
> Ah, thanks. So MaxHeapTuplesPerPage sets the upper boundary for the per page
> bitmap size. Thats about 36 bytes for 8K page. IOW if on an average there
> are 6 or more dead tuples per page, bitmap will outperform the current
> representation, assuming max allocation for bitmap. If we can use additional
> estimates to restrict the size to somewhat more conservative value and then
> keep overflow area, then probably the break-even happens even earlier than
> that. I hope this gives us a good starting point, but let me know if you
> think it's still a wrong approach to pursue.

Well, it's certainly a bigger change. I think the big concern is that
the amount of memory now becomes fixed based on the table size. So
one problem is that you have to figure out what you're going to do if
the bitmap doesn't fit in maintenance_work_mem. A related problem is
that it might fit but use more memory than before, which could cause
problems for some people. Now on the other hand it could also use
less memory for some people, and that would be good.

I am kind of doubtful about this whole line of investigation because
we're basically trying pretty hard to fix something that I'm not sure
is broken. I do agree that, all other things being equal, the TID
lookups will probably be faster with a bitmap than with a binary
search, but maybe not if the table is large and the number of dead
TIDs is small, because cache efficiency is pretty important. But even
if it's always faster, does TID lookup speed even really matter to
overall VACUUM performance? Claudio's early results suggest that it
might, but maybe that's just a question of some optimization that
hasn't been done yet.

I'm fairly sure that our number one priority should be to minimize the
number of cases where we need to do multiple scans of the indexes to
stay within maintenance_work_mem. If we're satisfied we've met that
goal, then within that we should try to make VACUUM as fast as
possible with as little memory usage as possible. I'm not 100% sure I
know how to get there, or how much work it's worth expending. In
theory we could even start with the list of TIDs and switch to the
bitmap if the TID list becomes larger than the bitmap would have been,
but I don't know if it's worth the effort.

/me thinks a bit.

Actually, I think that probably *is* worthwhile, specifically because
it might let us avoid multiple index scans in cases where we currently
require them. Right now, our default maintenance_work_mem value is
64MB, which is enough to hold a little over ten million tuples. It's
also large enough to hold a bitmap for a 14GB table. So right now if
you deleted, say, 100 tuples per page you would end up with an index
vacuum cycles for every ~100,000 pages = 800MB, whereas switching to
the bitmap representation for such cases would require only one index
vacuum cycle for every 14GB, more than an order of magnitude
improvement!

On the other hand, if we switch to the bitmap as the ONLY possible
representation, we will lose badly when there are scattered updates -
e.g. 1 deleted tuple every 10 pages. So it seems like we probably
want to have both options. One tricky part is figuring out how we
switch between them when memory gets tight; we have to avoid bursting
above our memory limit while making the switch. And even if our
memory limit is very high, we want to avoid using memory gratuitously;
I think we should try to grow memory usage incrementally with either
representation.

For instance, one idea to grow memory usage incrementally would be to
store dead tuple information separately for each 1GB segment of the
relation. So we have an array of dead-tuple-representation objects,
one for every 1GB of the relation. If there are no dead tuples in a
given 1GB segment, then this pointer can just be NULL. Otherwise, it
can point to either the bitmap representation (which will take ~4.5MB)
or it can point to an array of TIDs (which will take 6 bytes/TID).
That could handle an awfully wide variety of usage patterns
efficiently; it's basically never worse than what we're doing today,
and when the dead tuple density is high for any portion of the
relation it's a lot better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-09-14 15:22:38 Re: Push down more full joins in postgres_fdw
Previous Message Alvaro Herrera 2016-09-14 15:15:50 Re: Printing bitmap objects in the debugger