From: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> |
---|---|
To: | Claudio Freire <klaussfreire(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Vacuum: allow usage of more than 1GB of work mem |
Date: | 2016-09-16 03:39:10 |
Message-ID: | CABOikdN1O7PEUK+TFAAHxQGG3HTuGmXY5B7mcW=G8MgEHrojpw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Sep 16, 2016 at 12:24 AM, Claudio Freire <klaussfreire(at)gmail(dot)com>
wrote:
> On Thu, Sep 15, 2016 at 3:48 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > For example, we always allocate the TID array as large as we can fit into
> > m_w_m, but maybe we don't need to wait with switching to the bitmap until
> > filling the whole array - we could wait as long as the bitmap fits into
> the
> > remaining part of the array, build it there and then copy it to the
> > beginning (and use the bitmap from that point).
>
> The bitmap can be created like that, but grow from the end of the
> segment backwards.
>
> So the scan can proceed until the bitmap fills the whole segment
> (filling backwards), no copy required.
>
I thought about those approaches when I suggested starting with half m_w_m.
So you fill in TIDs from one end and upon reaching half point, convert that
into bitmap (assuming stats tell you that there is significant savings with
bitmaps) but fill it from the other end. Then reset the TID array and start
filling up again. That guarantees that you can always work within available
limit.
But I actually wonder if we are over engineering things and overestimating
cost of memmove etc. How about this simpler approach:
1. Divide table in some fixed size chunks like Robert suggested. Say 1GB
2. Allocate pointer array to store a pointer to each segment. For 1TB
table, thats about 8192 bytes.
3. Allocate a bitmap which can hold MaxHeapTuplesPerPage * chunk size in
pages. For 8192 block and 1GB chunk, thats about 4.6MB. *Note*: I'm
suggesting to use a bitmap here because provisioning for worst case, fixed
size TID array will cost us 200MB+ where as a bitmap is just 4.6MB.
4. Collect dead tuples in that 1GB chunk. Also collect stats so that we
know about the most optimal representation.
5. At the end of 1GB scan, if no dead tuples found, set the chunk pointer
to NULL, move to next chunk and restart step 4. If dead tuples found, then
check:
a. If bitmap can be further compressed by using less number of bits per
page. If so, allocate a new bitmap and compress the bitmap.
b. If TID array will be a more compact representation. If so, allocate a
TID array of right size and convert bitmap into an array.
c. Set chunk pointer to whichever representation we choose (of course
add headers etc to interpret the representation)
6. Continue until we consume all m_w_m or end-of-table is reached. If we
consume all m_w_m then do a round of index cleanup and restart.
I also realised that we can compact the TID array in step (b) above because
we only need to store 17 bits for block numbers (we already know which 1GB
segment they belong to). Given that usable offsets are also just 13 bits,
TID array needs only 4 bytes per TID instead of 6.
Many people are working on implementing different ideas, and I can
volunteer to write a patch on these lines unless someone wants to do that.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2016-09-16 04:27:08 | Re: Parallel sec scan in plpgsql |
Previous Message | Thomas Munro | 2016-09-16 03:11:38 | Re: kqueue |