Re: BUG #16614: Stale temporary objects makes vacuum ineffective when 1 million transactions remain

From: Giorgio Saviane <gsaviane(at)gmail(dot)com>
To: Francisco Olarte <folarte(at)peoplecall(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16614: Stale temporary objects makes vacuum ineffective when 1 million transactions remain
Date: 2020-09-12 11:57:23
Message-ID: CAHs6c0cs5PtQffn9Nt_sW99ncGBjvKJ01wy+1anUn_4J2cysVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Let me clarify.
The sequence of events that led our databases to the wraparound issue is
the following:

1. The application creates temporary tables that should be by default
cleaned at the end of the session. As a matter of fact the pooler doesn't
actually close the session, neither - apparently - issues a DISCARD ALL,
creating leaks among sessions.
2. Since sessions are kept alive with stale temporary tables the
autovacuum process cannot do maintenance on transaction ids.
3. Since we have no evidence on what is going on, the database reaches
the 1 million stop limit to avoid transaction wraparound.
4. The application gets then out of order. We stop the Postgres server
gracefully and we enter single user mode to run the vacuum
5. Any attempt to vacuum the database ends up with the usual "WARNING:
database "mydb" must be vacuumed within XXX transactions" message,
revealing that the vacuum could not fix up the transaction ids.
6. By running this SQL (still in single user mode)
SELECT nspname || '.' || relname AS objname
FROM pg_class JOIN pg_namespace
ON pg_namespace.oid = pg_class.relnamespace
WHERE relpersistence = 't' AND relkind = 'r'
We notice the presence of orphaned tables. Sessions are gone at this
point, so they are for sure orphaned.
7. If we manually drop the orphaned tables and run vacuum again the
warning message disappears and the database is restored to normal
functioning.

No matter this situation is caused by an application misbehavior or a
missed temporary table cleanup, I'm asking if there is a reasonable way to
prevent this situation improving the alerting with:

- When vacuum cannot repeatedly perform a transaction fix up in regular
multi user mode.
- When vacuum cannot perform a transaction fix up in single user mode
because there are orphaned tables in the way.

If we had these alerts we would not have lost a database.

Kind regards

Girogio Saviane

Il giorno sab 12 set 2020 alle ore 12:59 Francisco Olarte <
folarte(at)peoplecall(dot)com> ha scritto:

> On Sat, Sep 12, 2020 at 10:59 AM Giorgio Saviane <gsaviane(at)gmail(dot)com>
> wrote:
> ....
> > The question is: can you implement some warning thrown by
> vacuum/autovacuum in case orphaned temporary tables are in the way?
> > The connection pooler we are using is the one provided by Tomcat 7. I'm
> gonna check if it supports DISCARD ALL on close.
>
> From what I've read the thing is, if the pooler does not work
> correctly ( by issuing discard all or otherwise taking care of
> temporary tables ), there are few things as the tables are not
> orphaned on the server side.
>
> What the server sees is someone connects, creates the temporary
> tables, issues a lot of other commands but does not disconnect, so
> the tables must be kept alive in case the the session wants to use
> them again. DISCARD ALL will make them orphans.
>
> Francisco Olarte.
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Francisco Olarte 2020-09-12 15:13:44 Re: BUG #16614: Stale temporary objects makes vacuum ineffective when 1 million transactions remain
Previous Message Francisco Olarte 2020-09-12 10:59:05 Re: BUG #16614: Stale temporary objects makes vacuum ineffective when 1 million transactions remain