Re: Vacuum: allow usage of more than 1GB of work mem

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum: allow usage of more than 1GB of work mem
Date: 2017-04-27 07:25:11
Message-ID: CAD21AoBazzozsJ_HxMSSwzPa3m6-uRFOoReT50431ZSdvRn23w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 21, 2017 at 6:24 AM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Wed, Apr 12, 2017 at 4:35 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Apr 11, 2017 at 4:38 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>>> In essence, the patch as it is proposed, doesn't *need* a binary
>>> search, because the segment list can only grow up to 15 segments at
>>> its biggest, and that's a size small enough that linear search will
>>> outperform (or at least perform as well as) binary search. Reducing
>>> the initial segment size wouldn't change that. If the 12GB limit is
>>> lifted, or the maximum segment size reduced (from 1GB to 128MB for
>>> example), however, that would change.
>>>
>>> I'd be more in favor of lifting the 12GB limit than of reducing the
>>> maximum segment size, for the reasons above. Raising the 12GB limit
>>> has concrete and readily apparent benefits, whereas using bigger (or
>>> smaller) segments is far more debatable. Yes, that will need a binary
>>> search. But, I was hoping that could be a second (or third) patch, to
>>> keep things simple, and benefits measurable.
>>
>> To me, it seems a bit short-sighted to say, OK, let's use a linear
>> search because there's this 12GB limit so we can limit ourselves to 15
>> segments. Because somebody will want to remove that 12GB limit, and
>> then we'll have to revisit the whole thing anyway. I think, anyway.
>
> Ok, attached an updated patch that implements the binary search
>
>> What's not clear to me is how sensitive the performance of vacuum is
>> to the number of cycles used here. For a large index, the number of
>> searches will presumably be quite large, so it does seem worth
>> worrying about performance. But if we just always used a binary
>> search, would that lose enough performance with small numbers of
>> segments that anyone would care? If so, maybe we need to use linear
>> search for small numbers of segments and switch to binary search with
>> larger numbers of segments.
>
> I just went and tested.
>
> I implemented the hybrid binary search attached, and ran a few tests
> with and without the sequential code enabled, at small scales.
>
> The difference is statistically significant, but small (less than 3%).
> With proper optimization of the binary search, however, the difference
> flips:
>
> claudiofreire(at)klaumpp:~/src/postgresql.vacuum> fgrep shufp80
> fullbinary.s100.times
> vacuum_bench_s100.1.shufp80.log:CPU: user: 6.20 s, system: 1.42 s,
> elapsed: 18.34 s.
> vacuum_bench_s100.2.shufp80.log:CPU: user: 6.44 s, system: 1.40 s,
> elapsed: 19.75 s.
> vacuum_bench_s100.3.shufp80.log:CPU: user: 6.28 s, system: 1.41 s,
> elapsed: 18.48 s.
> vacuum_bench_s100.4.shufp80.log:CPU: user: 6.39 s, system: 1.51 s,
> elapsed: 20.60 s.
> vacuum_bench_s100.5.shufp80.log:CPU: user: 6.26 s, system: 1.42 s,
> elapsed: 19.16 s.
>
> claudiofreire(at)klaumpp:~/src/postgresql.vacuum> fgrep shufp80
> hybridbinary.s100.times
> vacuum_bench_s100.1.shufp80.log:CPU: user: 6.49 s, system: 1.39 s,
> elapsed: 19.15 s.
> vacuum_bench_s100.2.shufp80.log:CPU: user: 6.36 s, system: 1.33 s,
> elapsed: 18.40 s.
> vacuum_bench_s100.3.shufp80.log:CPU: user: 6.36 s, system: 1.31 s,
> elapsed: 18.87 s.
> vacuum_bench_s100.4.shufp80.log:CPU: user: 6.59 s, system: 1.35 s,
> elapsed: 26.43 s.
> vacuum_bench_s100.5.shufp80.log:CPU: user: 6.54 s, system: 1.28 s,
> elapsed: 20.02 s.
>
> That's after inlining the compare on both the linear and sequential
> code, and it seems it lets the compiler optimize the binary search to
> the point where it outperforms the sequential search.
>
> That's not the case when the compare isn't inlined.
>
> That seems in line with [1], that show the impact of various
> optimizations on both algorithms. It's clearly a close enough race
> that optimizations play a huge role.
>
> Since we're not likely to go and implement SSE2-optimized versions, I
> believe I'll leave the binary search only. That's the attached patch
> set.
>
> I'm running the full test suite, but that takes a very long while.
> I'll post the results when they're done.
>
> [1] https://schani.wordpress.com/2010/04/30/linear-vs-binary-search/

Thank you for updating the patch.

I've read this patch again and here are some review comments.

+ * Lookup in that structure proceeds sequentially in the list of segments,
+ * and with a binary search within each segment. Since segment's size grows
+ * exponentially, this retains O(log N) lookup complexity (2 log N to be
+ * precise).

IIUC we now do binary search even over the list of segments.

-----

We often fetch a particular dead tuple segment. How about providing a
macro for easier understanding?
For example,

#define GetDeadTuplsSegment(lvrelstats, seg) \
(&(lvrelstats)->dead_tuples.dt_segments[(seg)])

-----

+ if (vacrelstats->dead_tuples.num_segs == 0)
+ return;
+

+ /* If uninitialized, we have no tuples to delete from the indexes */
+ if (vacrelstats->dead_tuples.num_segs == 0)
+ {
+ return;
+ }

+ if (vacrelstats->dead_tuples.num_segs == 0)
+ return false;
+

As I listed, there is code to check if dead tuple is initialized
already in some places where doing actual vacuum.
I guess that it should not happen that we attempt to vacuum a
table/index page while not having any dead tuple. Is it better to have
Assert or ereport instead?

-----

@@ -1915,2 +2002,2 @@ count_nondeletable_pages(Relation onerel,
LVRelStats *vacrelstats)
- BlockNumber prefetchStart;
- BlockNumber pblkno;
+ BlockNumber prefetchStart;
+ BlockNumber pblkno;

I think that it's a unnecessary change.

-----

+ /* Search for the segment likely to contain the item pointer */
+ iseg = vac_itemptr_binsrch(
+ (void *) itemptr,
+ (void *)
&(vacrelstats->dead_tuples.dt_segments->last_dead_tuple),
+ vacrelstats->dead_tuples.last_seg + 1,
+ sizeof(DeadTuplesSegment));
+

I think that we can change the above to;

+ /* Search for the segment likely to contain the item pointer */
+ iseg = vac_itemptr_binsrch(
+ (void *) itemptr,
+ (void *) &(seg->last_dead_tuple),
+ vacrelstats->dead_tuples.last_seg + 1,
+ sizeof(DeadTuplesSegment));

We set "seg = vacrelstats->dead_tuples.dt_segments" just before this.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-04-27 07:33:56 Re: [PostgreSQL 10] default of hot_standby should be "on"?
Previous Message Heikki Linnakangas 2017-04-27 07:22:26 Re: scram and \password