Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: klaussfreire(at)gmail(dot)com
Cc: thomas(dot)munro(at)enterprisedb(dot)com, sfrost(at)snowman(dot)net, michael(dot)paquier(at)gmail(dot)com, daniel(at)yesql(dot)se, andres(at)anarazel(dot)de, sawada(dot)mshk(at)gmail(dot)com, a(dot)lubennikova(at)postgrespro(dot)ru, lubennikovaav(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
Date: 2018-02-06 07:35:42
Message-ID: 20180206.163542.117274019.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

At Fri, 2 Feb 2018 19:52:02 -0300, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote in <CAGTBQpaiNQSNJC8y4w82UBTaPsvSqRRg++yEi5wre1MFE2iD8Q(at)mail(dot)gmail(dot)com>
> On Thu, Jan 25, 2018 at 6:21 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> > On Fri, Jan 26, 2018 at 9:38 AM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> >> I had the tests running in a loop all day long, and I cannot reproduce
> >> that variance.
> >>
> >> Can you share your steps to reproduce it, including configure flags?
> >
> > Here are two build logs where it failed:
> >
> > https://travis-ci.org/postgresql-cfbot/postgresql/builds/332968819
> > https://travis-ci.org/postgresql-cfbot/postgresql/builds/332592511
> >
> > Here's one where it succeeded:
> >
> > https://travis-ci.org/postgresql-cfbot/postgresql/builds/333139855
> >
> > The full build script used is:
> >
> > ./configure --enable-debug --enable-cassert --enable-coverage
> > --enable-tap-tests --with-tcl --with-python --with-perl --with-ldap
> > --with-icu && make -j4 all contrib docs && make -Otarget -j3
> > check-world
> >
> > This is a virtualised 4 core system. I wonder if "make -Otarget -j3
> > check-world" creates enough load on it to produce some weird timing
> > effect that you don't see on your development system.
>
> I can't reproduce it, not even with the same build script.

I had the same error by "make -j3 check-world" but only twice
from many trials.

> It's starting to look like a timing effect indeed.

It seems to be truncation skip, maybe caused by concurrent
autovacuum. See lazy_truncate_heap() for details. Updates of
pg_stat_*_tables can be delayed so looking it also can fail. Even
though I haven't looked the patch closer, the "SELECT
pg_relation_size()" doesn't seem to give something meaningful
anyway.

> I get a similar effect if there's an active snapshot in another
> session while vacuum runs. I don't know how the test suite ends up in
> that situation, but it seems to be the case.
>
> How do you suggest we go about fixing this? The test in question is
> important, I've caught actual bugs in the implementation with it,
> because it checks that vacuum effectively frees up space.
>
> I'm thinking this vacuum test could be put on its own parallel group
> perhaps? Since I can't reproduce it, I can't know whether that will
> fix it, but it seems sensible.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc Cousin 2018-02-06 07:48:51 best way to check identical constraint between databases
Previous Message Michael Paquier 2018-02-06 07:15:49 Re: Add more information_schema columns