Re: patch: SQL/MED(FDW) DDL

From: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, SAKAMOTO Masahiko <sakamoto(dot)masahiko(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: SQL/MED(FDW) DDL
Date: 2010-10-07 11:20:56
Message-ID: 20101007202055.A04D.6989961C@metrosystems.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 05 Oct 2010 14:59:27 -0400
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > I wonder if we
> > should focus on first efforts on really simple cases like CSV files
> > (as you mentioned) and perhaps something like memcached, which has
> > different properties than a CSV file, but extremely simple ones. I
> > think it's inevitable that the API is going to get more complicated
> > from release to release and probably not in backward-compatible ways;
> > I think it's too early to be worried about that.
>
> +1 ... we can *not* assume that we know enough to get the API right
> the first time. It'll have to be stated up front that it's subject to
> change and you probably shouldn't be trying to develop FDWs separately
> from the main project for awhile yet.
>
> I think that looking only at CSV might be too simple, though. It'd be
> good to be at least trying to build a remote-PG adapter, even with the
> knowledge that it's all throwaway code.
Thanks for comments, and sorry for delayed reply.

I propose revised version of FDW API which supports both of CSV and PG.
It would be a minimal API to support SQL/MED, so some features such as
optimizing query or using remote index are not considered much.

======================
This FDW API is designed to support scanning a foreign table. As
SQL/MED standard says, updating external data is not supported. The
instance of a foreign table may be a CSV file or a table on an
external PostgreSQL server, or anything which could generate
PG-style tuples.

Scanning of a foreign table will be done in some steps below.
Note: FdwRoutine->XXX() means FDW API function.

1. Parser parses a query string and generate a query tree. In this
step, a foreign table is not distinguished from a local table.

2. Optimizer generates a Path pathnode (pathtype = T_ForeignPath) for
each foreign table in the query. Because foreign tables don't have
TID nor index (at least now), we don't add new Path type.

In this step, optimizer calls FdwRoutine->GetStatistics() to get
estimated costs of the foreign scan. FDW can calculate costs by
itself with some statistics, or get costs from remote-side, or leave
the costs as is.

3. Planner generates a ForeignScan plannode for each foreign Path.
ForeignScan could be typedef of Scan because it has no other member,
but it have been defined as a independent structure to make it be able
to add other members in the future.

4. Executor executes each ForeignScan node in some steps. Through the
Step-4, FDW can keep their own information about the foreign scan in the
ForeignScan->FdwReply.

4-a. To initialize a ForeignScan plannodes, ExecInitForeignScan() is
called for each ForeignScan node. In ExecInitForeignScan(),
ForeignScanState will be created from:

* ScanState which hold same information as SeqScan.
* FdwRoutine pointer as cache (for cache).
* some catalog information about the foreign table (for cache).
* connection established with FdwRoutine->ConnectServer().
FDW can use GetFSConnectionByName(name) to get pooled connection by
name in FdwRoutine->ConnectServer().

Then, ExecInitForeignScan() calls FdwRouteine->Open() to tell FDW that
it's time to start the query. FDW can do some initialization if
necessary.

4-b. To retrieve a tuple from the foreign table, ExecForeignScan()
calls FdwRoutine->Iterate(). FDW should set next tuple into
ss_ScanTupleSlot of ScanState or empty the slot to indicate EOF.
Projection and qual evaluation will be done in ExecScan() later, so
ExecForeignScan() should return all columns/tuples in the external
data store.

4-c. To reset the scan and rewind cursor to the head of the foreign
table, ExecForeignReScan() calls FdwRoutine->ReOpen(). This occurs
when a ForeignScan node is a inner node of a nested loop join.
FDW is required to return the first tuple again at next
FdwRoutine->Iterate() call.

4-d. At the end of execution of ForeignScan, ExecEndForeignScan() is
called. ExecEndForeignScan() calls FdwRoutine->Close() to tell FDW
that no more Iterate will called. FDW can do some finalization if
necessary.

5. Connections which have established via FdwRoutine->ConnectServer()
are pooled in the backend for future query which accesses same foreign
server. Pooling mechanism is implemented in core module, not in the
each FDW.

When a user executes DISCARD ALL command, or backend dies,
FdwRoutine->FreeFSConnection() is called for each foreign connection
to discard it.

To achieve features above, I propose following FDW API:

/*
* Almost same as SeqScan, but defined as a structure to allow add
* members in the future.
*/
typedef struct ForeignScan {
Scan scan;
} ForeignScan;

/*
* Handle to access FDW-depend data.
* Each FDW can use ForeignScanState->reply with casting between FdwReply
* and actual type.
*/
typedef FdwReply FdwReply;

/*
* ForeignScanState node is used to store scan status.
*/
typedef struct ForeignScanState {
ScanState ss; /* its first field is NodeTag */

FdwRoutine *routine; /* set of FDW routines */
ForeignDataWrapper *wrapper;/* foreign data wrapper */
ForeignServer *server; /* foreign server */
FSConnection *conn; /* connection to the foreign server */
UserMapping *user; /* user mapping */
ForeignTable *table; /* foreign table */
FdwReply *reply; /* private data for each data wrapper */
} ForeignScanState;

typedef struct FdwRoutine
{
/*
* Connect to the foreign server identified by server and user.
*/
FSConnection* (*ConnectServer)(ForeignServer *server, UserMapping *user);

/*
* Disconnect from the foreign server and free FSConnection object.
*/
void (*FreeFSConnection)(FSConnection *conn);

/*
* Estimate costs of a foreign path. FDW should update startup_cost
* and total_cost of the Path.
* To estimate cost, PG FDW might generate "EXPLAIN" SQL and
* execute it on the remote side, or collect statistics in
* somewhere and calculate from them.
* This parameters are same as cost_foreignscan() in
* optimizer/path/costsize.c (the function is just a proxy).
*/
void (*GetStatistics)(Path *path, PlannerInfo *root, RelOptInfo *baserel);

/*
* Prepare to return tuples.
*/
void (*Open)(ForeignScanState *scanstate);

/*
* Fetch the next tuple and fill tupleslot with it, or clear the slot
* to indicate EOF.
*/
void (*Iterate)(ForeignScanState *scanstate);

/*
* End the foreign scan and do some cleanup if necessary.
*/
void (*Close)(ForeignScanState *scanstate);

/*
* Re-initialize the foreign scan, used when the INNER executor node is
* executed again.
*/
void (*ReOpen)(ForeignScanState *scanstate);
} FdwRoutine;
======================

Issues from ideas above:

== Connection Pooling ==
Connect on each foreign scan is obviously useless, but it's open to
argument that who should manage the pooled connections, because for
CSV wrapper, no "connection" is needed at all, but PG wrapper might
connect once per database per user.

Maybe DBAs want to see how much and what kind of connections are
established now, so we should show list of active connections via a
view or function. To treat connections transparently, I propose
that connection pooling mechanism in the core module, not in each FDW
module. Or should we add API to return list of active connections in
common format?

If we have decided to leave connection management to each FDWs,
ConnectServer() and FreeFSConnection() can be removed from API.

== ANALYZE support ==
Even if a FDW wants to keep statistics in local area such as
pg_class.reltuples, current API doesn't provide appropriate timing.

Should we add API which is called from ANALYZE to allow FDWs to handle
statistics when user wants.

Any comments are welcome.

Regards,
--
Shigeru Hanada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-10-07 11:25:27 Re: todo point: plpgsql - scrollable cursors are supported
Previous Message Simon Riggs 2010-10-07 11:08:48 Re: Issues with Quorum Commit