Re: parallel mode and parallel contexts

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel mode and parallel contexts
Date: 2015-03-18 21:36:52
Message-ID: 20150318213652.GA9495@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Reading the README first, the rest later. So you can comment on my
comments, while I actually look at the code. Parallelism, yay!

On 2015-03-18 12:02:07 -0400, Robert Haas wrote:
> +Instead, we take a more pragmatic approach: we try to make as many of the
> +operations that are safe outside of parallel mode work correctly in parallel
> +mode as well, and we try to prohibit the rest via suitable error
> checks.

I'd say that we'd try to prohibit a bunch of common cases. Among them
the ones that are triggerable from SQL. We don't really try to prohibit
many kinds of traps, as you describe.

> + - The authenticated user ID and current database. Each parallel worker
> + will connect to the same database as the initiating backend, using the
> + same user ID.

This sentence immediately makes me wonder why only the authenticated
user, and not the currently active role, is mentioned.

> + - The values of all GUCs. Accordingly, permanent changes to the value of
> + any GUC are forbidden while in parallel mode; but temporary changes,
> + such as entering a function with non-NULL proconfig, are potentially OK.

"potentially OK" sounds odd to me. Which danger are you seing that isn't
relevan tfor norm

> + - The combo CID mappings. This is needed to ensure consistent answers to
> + tuple visibility checks. The need to synchronize this data structure is
> + a major reason why we can't support writes in parallel mode: such writes
> + might create new combo CIDs, and we have now way to let other workers
> + (or the initiating backend) know about them.

Absolutely *not* in the initial version, but we really need to find a
better solution for this.

> + - The currently active user ID and security context. Note that this is
> + the fourth user ID we restore: the initial step of binding to the correct
> + database also involves restoring the authenticated user ID. When GUC
> + values are restored, this incidentally sets SessionUserId and OuterUserId
> + to the correct values. This final step restores CurrentUserId.

Ah. That's the answer for above. Could you just move it next to the
other user bit?

> +We could copy the entire transaction state stack,
> +but most of it would be useless: for example, you can't roll back to a
> +savepoint from within a parallel worker, and there are no resources to
> +associated with the memory contexts or resource owners of intermediate
> +subtransactions.

I do wonder if we're not going to have to change this in the not too far
away future. But then, this isn't externally visible at all, so
whatever.

> +At the end of a parallel operation, which can happen either because it
> +completed successfully or because it was interrupted by an error, parallel
> +workers associated with that operation exit. In the error case, transaction
> +abort processing in the parallel leader kills of any remaining workers, and
> +the parallel leader then waits for them to die.

Very slightly awkward because first you talk about successful *or* error
and then about abort processing.

> + - Cleanup of pg_temp namespaces is not done. The initiating backend is
> + responsible for this, too.

How could a worker have its own pg_temp namespace?

> +Lock Management
> +===============
> +
> +Certain heavyweight locks that the initiating backend holds at the beginning
> +of parallelism are copied to each worker, which unconditionally acquires them.
> +The parallel backends acquire - without waiting - each such lock that the
> +leader holds, even if that lock is self-exclusive. This creates the unusual
> +situation that a lock which could normally only be held by a single backend
> +can be shared among several backends in a parallel group.
> +
> +Obviously, this presents significant hazards that would not be present in
> +normal execution. If, for example, a backend were to initiate parallelism
> +while ReindexIsProcessingIndex() were true for some index, the parallel
> +backends launched at that time would neither share this state nor be excluded
> +from accessing the index via the heavyweight lock mechanism. It is therefore
> +imperative that backends only initiate parallelism from places where it will
> +be safe for parallel workers to access the relations on which they hold locks.
> +It is also important that they not afterwards do anything which causes access
> +to those relations to become unsafe, or at least not until after parallelism
> +has concluded. The fact that parallelism is strictly read-only means that the
> +opportunities for such mishaps are few and far between; furthermore, most
> +operations which present these hazards are DDL operations, which will be
> +rejected by CheckTableNotInUse() if parallel mode is active.
> +
> +Only relation locks, locks on database or shared objects, and perhaps the lock
> +on our transaction ID are copied to workers.

"perhaps"?

> Advisory locks are not copied,
> +but the leader may hold them at the start of parallelism; they cannot
> +subsequently be manipulated while parallel mode is active. It is not safe to
> +attempt parallelism while holding a lock of any other type, such as a page,
> +tuple, or relation extension lock, and such attempts will fail. Although
> +there is currently no reason to do so, such locks could be taken and released
> +during parallel mode; they merely cannot be held at the start of parallel
> +mode, since we would then fail to provide necessary mutual exclusion.

Is it really true that no such locks are acquired? What about e.g. hash
indexes? They seem to be acquiring page locks while searching.

> +Copying locks to workers is important for the avoidance of undetected
> +deadlock between the initiating process and its parallel workers. If the
> +initiating process holds a lock on an object at the start of parallelism,
> +and the worker subsequently attempts to acquire a lock on that same object
> +and blocks, this will result in an undetected deadlock, because the
> +initiating process cannot finish the transaction (thus releasing the lock)
> +until the worker terminates, and the worker cannot acquire the lock while
> +the initiating process holds it. Locks which the processes involved acquire
> +and then release during parallelism do not present this hazard; they simply
> +force the processes involved to take turns accessing the protected resource.

I don't think this is a strong guarantee. There very well can be lack of
forward progress if they're waiting on each other in some way. Say the
master backend holds the lock and waits on output from a worker. The
worker then will endlessly wait for the lock to become free. A
deadlock. Or, as another scenario, consider cooperating backends that
both try to send tuples to each other but the queue is full. A deadlock.

To me it seems the deadlock detector has to be enhanced to be able to
see 'waiting for' edges. Independent on how we resolve our difference of
opinion on the copying of locks.

It seems to me that this isn't all that hard: Whenever we block waiting
for another backend we enter ourselves on the wait queue to that
backend's virtual transaction. When finished we take the blocking
backend off. That, afaics, should do it. Alternatively we can just
publish what backend we're waiting for in PGPROC and make deadlock also
look at that; but, while slightly cleaner, that looks like being more
invasive.

> +Copying locks to workers is also important for the avoidance of undetected
> +deadlock involving both the parallel processe and other processes.

"processe"

> For
> +example, suppose processes A1 and A2 are cooperating parallel processes and
> +B is an unrelated process. Suppose A1 holds a lock L1 and waits for A2 to
> +finish; B holds a lock L2 and blocks waiting for L1; and A2 blocks waiting
> +for L2. Without lock copying, the waits-for graph is A2 -> B -> A1; there
> +is no cycle. With lock copying, A2 will also hold the lock on L1 and the
> +deadlock detector can find the cycle A2 -> B -> A2. As in the case of
> +deadlocks within the parallel group, undetected deadlock occur if either A1
> +or A2 acquired a lock after the start of parallelism and attempted to
> +retain it beyond the end of parallelism. The prohibitions discussed above
> +protect us against this case.

I think we'd need to add more restrictions to actually make this
guarantee anything. At the very least it would not only have to be
prohibited to end with a lock held, but also to wait for a worker (or
the leader) backend with a not initially granted lock.

Am I missing something or does the copying currently break deadlock.c?
Because afaics that'll compute lock conflicts in FindLockCycleRecurse()
without being aware of the conflicting lock being granted to two
backends. Won't this at least trigger spurious deadlocks? It might
happen to be without consequence for some reason, but this would, at the
very least, need some very careful review.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2015-03-18 21:52:44 Zero-padding and zero-masking fixes for to_char(float)
Previous Message Peter Geoghegan 2015-03-18 21:00:51 Re: Using 128-bit integers for sum, avg and statistics aggregates