Re: POC: Cleaning up orphaned files using undo logs

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-07-16 16:14:30
Message-ID: CA+TgmoY4J1vDdkS-fGuD7h=wQC=0Dk0ePfTORZ3HkbkhTiaiTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 16, 2019 at 12:32 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> The idea is that the queues can get full, but not rollback hash table.
> In the case where the error queue gets full, we mark the entry as
> Invalid in the hash table and later when discard worker again
> encounters this request, it adds it to the queue if there is a space
> available and marks the entry in the hash table as valid. This allows
> us to keep the information of all xacts having pending undo in shared
> memory.

I don't understand. How is it OK to have entries in the hash table
but not the queues? And why would that ever happen, anyway? If you
make the queues as big as the hash table is, then they should never
fill up (or, if using binary heaps with lazy removal rather than
rbtrees, they might fill up, but if they do, you can always make space
by cleaning out the stale entries).

> I think this can regress the performance when there are many
> concurrent sessions unless there is a way to add/remove request
> without a lock. As of now, we don't enter any request or block any
> space in shared memory related to pending undo till there is an error
> or user explicitly Rollback the transaction. We can surely do some
> other way as well, but this way we won't have any overhead in the
> commit or successful transaction's path.

Well, we're already incurring some overhead to attach to an undo log,
and that probably involves some locking. I don't see why this would
be any worse, and maybe it could piggyback on the existing work.
Anyway, if you don't like this solution, propose something else. It's
impossible to correctly implement a hard limit unless the number of
aborted-but-not-yet-undone transaction is bounded to (HARD_LIMIT -
ENTRIES_THAT_WOULD_BE_ADDED_AFTER_RECOVERY_IF_THE_SYSTEM_CRASHED_NOW).
If there are 100 transactions each bound to 2 undo logs, and you
crash, you will need to (as you have it designed now) add another 200
transactions to the hash table upon recovery, and that will make you
exceed the hard limit unless you were at least 200 transactions below
the limit before the crash. Have you handled that somehow? If so,
how? It seems to me that you MUST - at a minimum - keep a count of
undo logs attached to in-progress transactions, if not the actual hash
table entries.

> Again coming to question of whether we need single or multiple entries
> for one-request-per-persistence level, the reason for the same we have
> discussed so far is that discard worker can register the requests for
> them while scanning undo logs at different times.

Yeah, but why do we need that in the first place? I wrote something
about that in a previous email, but you haven't responded to it here.

> However, there are
> few more things like what if while applying the actions, the actions
> for logged are successful and unlogged fails, keeping them separate
> allows better processing. If one fails, register its request in error
> queue and try to process the request for another persistence level. I
> think the requests for the different persistence levels are kept in a
> separate log which makes their processing separately easier.

I don't find this convincing. It's not really an argument, just a
vague list of issues. If you want to convince me, you'll need to be
much more precise.

It seems to me that it is generally undesirable to undo the unlogged
part of a transaction separately from the logged part of the
transaction. But even if we want to support that, having one entry
per XID rather than one entry per <XID, persistence level> doesn't
preclude that. Even if you discover the entries at different times,
you can still handle that by updating the existing entry rather than
making a new one.

There might be a good reason to do it the way you are describing, but
I don't see that you've made the argument for it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-07-16 16:22:11 Re: POC: Cleaning up orphaned files using undo logs
Previous Message Tom Lane 2019-07-16 16:08:53 Re: POC: converting Lists into arrays