Re: A Replication Idea

From: "Command Prompt, Inc(dot)" <pgsql-general(at)commandprompt(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: A Replication Idea
Date: 2002-02-22 01:03:45
Message-ID: Pine.LNX.4.30.0202211642550.31949-100000@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>How would it handle functions, which could potentially modify data, even
>from a select statement?

It seems that you'd have two options, if you wanted the proxy to be truly
transparent to the client:

1. Send ALL SQL statements down the wire to each node, including SELECT
statements, since selected functions may modify data.

2. Write a small, fast, reliable parser that checks for criteria which
would make the statement potentially data-modifying (e.g., the
existence of a function), and send only data-modifying SELECTs along
with your standard UPDATEs, DELETEs, etc.

However, it probably just occurred to you all as it just occurred to me
that this is pretty moot, because functions aren't the only concern: you
could have a trigger on a table that would wipe out idea #2. ;)

Really, there are too many transparent ways data can be modified by
seemingly innocuous statements, so parsing a statement for distribution
is right out; it seems as though each node is going to have to require a
copy of EACH statement that the proxy runs into in order to maintain 100%
integrity.

However, that doesn't mean your proxy needs to get answer back from all of
the nodes in terms of result sets. Something as simple as a systemic
packet indicating that the downstream-execution was successful would be
enough data for the proxy to know what's going on, provided it knows it
should get its answer soon from another node (e.g., the node with the
lowest load).

Result sets could still be cached based on a statement, within some
specified degree of accuracy (e.g., how much time elapses before a cached
resultset expires); you'd just need to make sure that even though you're
returning a cached result set, you still send the request to each back-end
to get processed in its own time.

Seems like some *really* careful threading might be called for; one thread
to listen to incoming traffic, from which downstream events are queued up,
another thread sending off those events to the back-end in the order they
were received, and another thread listening for answers from nodes, and
queueing up responses to be sent back to the appropriate client's socket.

Regards,
Jw.
--
jlx(at)commandprompt(dot)com, by way of pgsql-general(at)commandprompt(dot)com
http://www.postgresql.info/
http://www.commandprompt.com/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thalis A. Kalfigopoulos 2002-02-22 01:12:18 Re: environment variables, output format & setting current
Previous Message Brent Wood 2002-02-22 01:01:44 environment variables, output format & setting current date