From: | Giorgio Saviane <gsaviane(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Francisco Olarte <folarte(at)peoplecall(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #16614: Stale temporary objects makes vacuum ineffective when 1 million transactions remain |
Date: | 2020-09-12 16:06:10 |
Message-ID: | CAHs6c0fTSor3WHd_CSsGThtOSAgO2PAcCT+M+uF7hcgQsQvYtQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Ok, let me add one more step in that sequence.
Before shutting down Postgres and going down to single mode we stopped the
application server and other services around it. I'm pretty sure that there
was a reasonable window of time before we went to single mode. Moreover, if
the database already reached the stop limit for preventing the wraparound,
was it still possible for autovacuum to kick in in multiuser mode?
Wasn't the graceful database shutdown enough to let Postgres drop the
temporary tables?
Kind regards
Giorgio Saviane
Il giorno sab 12 set 2020 alle ore 17:49 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> ha
scritto:
> Giorgio Saviane <gsaviane(at)gmail(dot)com> writes:
> > Let me clarify.
> > The sequence of events that led our databases to the wraparound issue is
> > the following:
>
> > 1. The application creates temporary tables that should be by default
> > cleaned at the end of the session. As a matter of fact the pooler
> doesn't
> > actually close the session, neither - apparently - issues a DISCARD
> ALL,
> > creating leaks among sessions.
> > 2. Since sessions are kept alive with stale temporary tables the
> > autovacuum process cannot do maintenance on transaction ids.
> > 3. Since we have no evidence on what is going on, the database reaches
> > the 1 million stop limit to avoid transaction wraparound.
> > 4. The application gets then out of order. We stop the Postgres server
> > gracefully and we enter single user mode to run the vacuum
> > 5. Any attempt to vacuum the database ends up with the usual "WARNING:
> > database "mydb" must be vacuumed within XXX transactions" message,
> > revealing that the vacuum could not fix up the transaction ids.
>
> So, actually, that was the wrong recovery method. All you really
> needed to do was terminate that long-running session(s) and wait for
> a minute or two. There is code to delete old orphaned temp tables,
> even as far back as 9.5. But *it's part of autovacuum*. Dropping
> down to single-user mode prevents it from running.
>
> I thought briefly about whether VACUUM in single-user mode should
> also run that code; but I'm not really convinced it'd be a good thing
> to be auto-dropping tables in single-user mode. And anyway we shouldn't
> be encouraging use of single-user mode, and it'd be much better to not
> be in this situation to start with.
>
> What seems like the most helpful thing we could do is teach autovacuum to
> issue log warnings when it sees old temp tables holding back datfrozenxid.
> The trick here is to not spam the log too much. After a few moment's
> thought, maybe warn if:
>
> * a table is temp but not orphaned, and
> * its relfrozenxid or relminmxid is equal to datfrozenxid or datminmxid,
> proving that some previous autovac run saw it as holding back those
> horizons, and
> * those xids are older than the relevant freeze limit.
>
> For extra credit maybe the message could mention the PID of the
> session owning the table?
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-09-12 18:18:04 | Re: BUG #16614: Stale temporary objects makes vacuum ineffective when 1 million transactions remain |
Previous Message | Tom Lane | 2020-09-12 15:49:34 | Re: BUG #16614: Stale temporary objects makes vacuum ineffective when 1 million transactions remain |