From: | "Dawid Kuroczko" <qnex42(at)gmail(dot)com> |
---|---|
To: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Remote tables infrastructure. |
Date: | 2006-11-10 00:55:25 |
Message-ID: | 758d5e7f0611091655t6b591e43iab91823baae4b6e3@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello.
One nice feature of PostgreSQL's CREATE FUNCTION/RULE/VIEW system
is ability to create objects which "may appear closer than they really are".
Hence, all the dbi-link-like modules.
The usual method is to create a view on a set returning function (if
reasonable),
create RULEs for managing INSERT/UPDATE/DELETE, and voila.
This approach has some limitations. First of all if you add WHERE restrictions
to such a view, your PostgreSQL will first slurp all the data from SRF, and then
do the hard work of limiting/ordering. To circumvent that you may want to
pass actual query string at the application level.
Methinks it would be good idea to discuss a more general approach, i.e.
don't "create" links using existing infrastructure but rather provide something
conceptually similar to CREATE AGGREGATE statements. In other words,
make a remote link a first class object inside PostgreSQL.
The first thing, a connection:
CREATE DBLINK name (
connection = "some connection string",
start_transaction = foo_start_transaction_func,
rollback = foo_rollback_func,
before_commit = foo_prepare_transaction_func,
after_commit = foo_commit_prepared_func,
...
);
A start_transaction_func should receive ISOLATION LEVEL of course.
A 2PC infrastructure could be used to ensure that both transactions
will either commit or rollback. I think the remote transaction should
start (initialize connection) on first access to the remote table (as not
to cause connection storms upon backend process startup). If it
fails to initialize it, then our transaction should rollback.
I think it may be reasonable to provide timeout on IDLE (not IDLE in
transaction) remote link. Of course "nontransactional" remote links
should be possible without inconveniences. And we would need
to take (and keep) connection string which will allow us to reuse
the library functions with different connections.
Normally one would prototype using PLperlU/PLpythonU PLs and
if one wants more robust interface -- one would made a .so library
providing more robust interface.
Then we should have ability to access remote tables. I think PostgreSQL
should not assume that it knows what's on the other side of the link, and
just blindly follow user's request for a remote table (and throw an error
if table does not exist). I'll elaborate further on access functions later on.
For instance accessing table bar on foo DBLINK would be possible through:
SELECT * FROM foo.public.bar WHERE baz = 123;
foo -- being connection name,
public -- being schema name,
bar -- being table name.
You see now where am I cruising to: this way we're using
<DB>.<SCHEMA>.<TABLE> naming scheme, and if we provide
implicit (or explicit) dblink library for connecting other DBs
in current (or remote) cluster -- we've got a nice new feature
"for free".
Once CREATEd DBLINK should remain in system tables until
somebody DROPs it. It might be good idea to also provide
CREATE TEMPORARY DBLINK similar in concept to temporary
tables.
At minimum we should provide (in CREATE DBLINK) extra functions
for accessing and modifying data. Something like
CREATE DBLINK foo (
table_select = foo_tab_select_func
);
It should be possible to provide "where" parameters, "order by" and "limit"
somehow -- if remote supports it. And we should do it so that
remote end could easily transform LIMIT into ROWNUM for instance.
I think PostgreSQL should reevaluate results -- if possible after all it's
PostgreSQL, and we trust PostgreSQL. Then again there is a big
question how to provide them, and if simpler cases of GROUP BY
should be also possible (like count(*) ?).
At simplest it might be something like:
table_sepect(table_name text, ret_cols text[], where_cond,
where_vals[], order_cols[], limit int, offset int);
SELECT a, t0, t1-t0,
FROM foo.public.bar
WHERE (a > 4 AND b IN ('a','z')) OR t=now()
ORDER BY a
LIMIT 100 OFFSET 5;
foo_tab_select_func('bar',
'{a,t0,t0}',
'("a" > ? AND "b" IN (?,?)) OR "t" = ?', '{4,a,z,2006-11-10T01:13}',
'a',
100, 5);
I know it looks a bit awkward, and I do hope someone comes with
a better idea altogether. My idea was to "transform" original query
conditions into perl DBI prepare-like string -- with column names
properly quoted, and arguments supplied as a separate argument.
I.e provide strict rules and limited number of operators to the query
(brackets, AND, OR, NOT, >, <, =, >=, <=, IN (...), perhaps
IS [NOT] NULL). The remote end would try to implement as much
as possible but could as well put it on main backend's back.
There is a case with LIMIT/OFFSET -- we must know if remote end
will do the work for us or not. Also, it would be nice to be able to
push more advanced parts like GROUP BY queries, or even JOINs
(if applicable) to the remote end. Perhaps as a special mode for
PostgreSQL-compatible backends -- i.e. if some dblink driver
tries to understand PostgreSQL requests -- it is free to do so;
if it quacks like a duck, it is a duck. I haven't got idea what format
of such 'request passing' would be most efficient for PostgreSQL,
but I feel it should be standardized enough to enable older versions
of PostgreSQL to be connected to latest backends.
It would be also nice to be able to ask for statistics for remote
table, if such a function is provided. The stats might be stored
inside catalogs, or in backend ram (it might be configurable via
CREATE DBLINK, ALTER DBLINK). They could be updated
via ANALYZE foo.public.bar command.
The syntax for foo_tab_delete is I think pretty straightforward
consequence of foo_tab_select. It would be nice if foo_tab_insert
would also be able to support muti-col inserts (with configurable
max. rows per call; this to support INSERT INTO foo.public.bar
SELECT * from sometable).
UPDATEs would need a upd_cols[] and upd_col_vals[] arguments
plus a select part.
Other objects that should be visible would be SEQUENCEs,
especially as it allows to use one central "sequence dispatching"
backend, and a farm of DBs which use such link for fetching
sequences. And sequences should be cached -- i.e fetch
a batch of N sequences, and return them locally until batch
is exhausted -- then fetch next batch. Holes in numeration
will occur, but that's OK (it would be nice to have a global,
shared-memory managed pool for such sequences).
Access functions are simplest, I think -- just a sequence
fetcher and setter. The fetcher could have optional argument
for fetching batch number of sequences (for INSERT INTO sometable
SELECT nextval('foo.public.global_seq'), bar FROM blah;)
Remote FUNCTIONs would be nice also. The call would require
to pack list of arguments along a list of argument values.
DDL should not be possible, though driver API might provide
user-callable foo_remote_command('query to do...') FUNCTION
for such uses;
Of course sometimes we would like to know "what's on the
other side". I guess foo.information_schema.* should be
a standard way of giving such information. The remote end
should provide such an information, but I am not sure (performance
wise) if backend should rely on such information. It would be
certainly easier for dblink driver developers to be able to have
a working dblink without information_schema.
There is a question if it should be possible to "alias" remote
table/sequence/whatever into local schema ("CREATE DBLINK TABLE"),
and personally I would say -- if one wants to, she's free to make
a usal VIEW+RULE machinery.
I know this means huge amount of work, and I know I am not
qualified to even propose such an idea, but I think it would be really
neat feature. It could be used to provide some form synchronous
replication, it would enable to create "real" clusters which some
work do locally and other remotely, it would enable making robust
interfaces to other engines like Berkeley DB, and so on.
How do you think, is it possible, or am I daydreaming?
Regards,
Dawid
From | Date | Subject | |
---|---|---|---|
Next Message | Glen Parker | 2006-11-10 01:00:11 | AutoVacuum on demand? |
Previous Message | Tom Lane | 2006-11-10 00:39:54 | Re: planer picks a bad plan (seq-scan instead of index) when adding an additional join |