Re: Support Parallel Query Execution in Executor

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
Cc: "Myron Scott" <lister(at)sacadia(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Support Parallel Query Execution in Executor
Date: 2006-04-09 21:54:56
Message-ID: 14259.1144619696@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

"Luke Lonergan" <llonergan(at)greenplum(dot)com> writes:
> On 4/9/06 9:27 AM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> 2. There are some low-level assumptions that no one reads in pages of
>> a relation without having some kind of lock on the relation (consider
>> eg the case where the relation is being dropped). A bgwriter-like
>> process wouldn't be able to hold lmgr locks, and we wouldn't really want
>> it to be thrashing the lmgr shared data structures for each read anyway.
>> So you'd have to design some interlock to guarantee that no backend
>> abandons a query (and releases its own lmgr locks) while an async read
>> request it made is still pending. Ugh.

> Does this lead us right back to planning for the appropriate amount of
> readahead at plan time? We could consider a "page range" lock at that point
> instead of locking each individual page.

No, you're missing my point entirely. What's bothering me is the
prospect of a "bgreader" process taking actions that are only safe
because of a lock that is held by a different process --- changing the
granularity of that lock doesn't get you out of trouble.

Here's a detailed scenario:

1. Backend X reads page N of a table T, queues a request for N+1.
2. While processing page N, backend X gets an error and aborts
its transaction, thereby dropping all its lmgr locks.
3. Backend Y executes a DROP or TRUNCATE on T, which it can
now do because there's no lock held anywhere on T. There
are actually two interesting sub-phases of this:
3a. Kill any shared buffers holding pages to be deleted.
3b. Physically drop or truncate the OS file.
4. Bgreader tries to execute the pending read request. Ooops.

If step 4 happens after 3b, the bgreader gets an error. Maybe we could
kluge that to not cause any serious problems, but the nastier case is
where 4 happens between 3a and 3b --- the bgreader sees nothing wrong,
but it's now loaded a shared buffer that must *not* be there.

Having thought more about this, there may be a solution possible using
tighter integration with the bufmgr. The bufmgr already has a notion of
a buffer page being "read busy". Maybe, rather than just pushing a read
request into a separate queue somewhere, the requestor has to assign a
shared buffer for the page it wants and put the buffer into read-busy
state, but then pass the request to perform the physical read off to
someone else. The advantage of this is that there's state that step 3a
can see telling it that a conflicting read is pending, and it just needs
to wait for the read to finish before killing the buffer.

Bottom line seems to be: just as the bgwriter is pretty intimately tied
to bufmgr, bgreaders would have to be as well.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2006-04-10 00:28:48 Re: [SUGGESTION] CVSync
Previous Message Gregory Maxwell 2006-04-09 21:04:06 Re: Support Parallel Query Execution in Executor

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2006-04-09 22:01:50 Re: Couple of minor fixes
Previous Message Magnus Hagander 2006-04-09 21:23:50 Re: Couple of minor fixes