Re: dynamic result sets support in extended query protocol

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dynamic result sets support in extended query protocol
Date: 2020-10-09 17:32:48
Message-ID: f31f0f23-c5a9-0bdf-e11c-3d8f35ee4fbc@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 10/8/20 3:46 AM, Peter Eisentraut wrote:
> I want to progress work on stored procedures returning multiple result
> sets.  Examples of how this could work on the SQL side have previously
> been shown [0].  We also have ongoing work to make psql show multiple
> result sets [1].  This appears to work fine in the simple query
> protocol.  But the extended query protocol doesn't support multiple
> result sets at the moment [2].  This would be desirable to be able to
> use parameter binding, and also since one of the higher-level goals
> would be to support the use case of stored procedures returning
> multiple result sets via JDBC.
>
> [0]:
> https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com
> [1]: https://commitfest.postgresql.org/29/2096/
> [2]:
> https://www.postgresql.org/message-id/9507.1534370765%40sss.pgh.pa.us
>
> (Terminology: I'm calling this project "dynamic result sets", which
> includes several concepts: 1) multiple result sets, 2) those result
> sets can have different structures, 3) the structure of the result
> sets is decided at run time, not declared in the schema/procedure
> definition/etc.)
>
> One possibility I rejected was to invent a third query protocol beside
> the simple and extended one.  This wouldn't really match with the
> requirements of JDBC and similar APIs because the APIs for sending
> queries don't indicate whether dynamic result sets are expected or
> required, you only indicate that later by how you process the result
> sets.  So we really need to use the existing ways of sending off the
> queries.  Also, avoiding a third query protocol is probably desirable
> in general to avoid extra code and APIs.
>
> So here is my sketch on how this functionality could be woven into the
> extended query protocol.  I'll go through how the existing protocol
> exchange works and then point out the additions that I have in mind.
>
> These additions could be enabled by a _pq_ startup parameter sent by
> the client.  Alternatively, it might also work without that because
> the client would just reject protocol messages it doesn't understand,
> but that's probably less desirable behavior.
>
> So here is how it goes:
>
> C: Parse
> S: ParseComplete
>
> At this point, the server would know whether the statement it has
> parsed can produce dynamic result sets.  For a stored procedure, this
> would be declared with the procedure definition, so when the CALL
> statement is parsed, this can be noticed.  I don't actually plan any
> other cases, but for the sake of discussion, perhaps some variant of
> EXPLAIN could also return multiple result sets, and that could also be
> detected from parsing the EXPLAIN invocation.
>
> At this point a client would usually do
>
> C: Describe (statement)
> S: ParameterDescription
> S: RowDescription
>
> New would be that the server would now also respond with a new
> message, say,
>
> S: DynamicResultInfo
>
> that indicates that dynamic result sets will follow later.  The
> message would otherwise be empty.  (We could perhaps include the
> number of result sets, but this might not actually be useful, and
> perhaps it's better not to spent effort on counting things that don't
> need to be counted.)
>
> (If we don't guard this by a _pq_ startup parameter from the client,
> an old client would now error out because of an unexpected protocol
> message.)
>
> Now the normal bind and execute sequence follows:
>
> C: Bind
> S: BindComplete
> (C: Describe (portal))
> (S: RowDescription)
> C: Execute
> S: ... (DataRows)
> S: CommandComplete
>
> In the case of a CALL with output parameters, this "primary" result
> set contains one row with the output parameters (existing behavior).
>
> Now, if the client has seen DynamicResultInfo earlier, it should now
> go into a new subsequence to get the remaining result sets, like this
> (naming obviously to be refined):
>
> C: NextResult
> S: NextResultReady
> C: Describe (portal)
> S: RowDescription
> C: Execute
> ....
> S: CommandComplete
> C: NextResult
> ...
> C: NextResult
> S: NoNextResult
> C: Sync
> S: ReadyForQuery
>
> I think this would all have to use the unnamed portal, but perhaps
> there could be other uses with named portals.  Some details to be
> worked out.
>
> One could perhaps also do without the DynamicResultInfo message and
> just put extra information into the CommandComplete message indicating
> "there are more result sets after this one".
>
> (Following the model from the simple query protocol, CommandComplete
> really means one result set complete, not the whole top-level command.
> ReadyForQuery means the whole command is complete.  This is perhaps
> debatable, and interesting questions could also arise when considering
> what should happen in the simple query protocol when a query string
> consists of multiple commands each returning multiple result sets. 
> But it doesn't really seem sensible to cater to that.)
>
> One thing that's missing in this sequence is a way to specify the
> desired output format (text/binary) for each result set.  This could
> be added to the NextResult message, but at that point the client
> doesn't yet know the number of columns in the result set, so we could
> only do it globally.  Then again, since the result sets are dynamic,
> it's less likely that a client would be coded to set per-column output
> codes. Then again, I would hate to bake such a restriction into the
> protocol, because some is going to try.  (I suspect what would be more
> useful in practice is to designate output formats per data type.)  So
> if we wanted to have this fully featured, it might have to look
> something like this:
>
> C: NextResult
> S: NextResultReady
> C: Describe (dynamic) (new message subkind)
> S: RowDescription
> C: Bind (zero parameters, optionally format codes)
> S: BindComplete
> C: Describe (portal)
> S: RowDescription
> C: Execute
> ...
>
> While this looks more complicated, client libraries could reuse
> existing code that starts processing with a Bind message and continues
> to CommandComplete, and then just loops back around.
>
> The mapping of this to libpq in a simple case could look like this:
>
> PQsendQueryParams(conn, "CALL ...", ...);
> PQgetResult(...);  // gets output parameters
> PQnextResult(...);  // new: sends NextResult+Bind
> PQgetResult(...);  // and repeat
>
> Again, it's not clear here how to declare the result column output
> formats.  Since libpq doesn't appear to expose the Bind message
> separately, I'm not sure what to do here.
>
> In JDBC, the NextResult message would correspond to the
> Statement.getMoreResults() method.  It will need a bit of conceptual
> adjustment because the first result set sent on the protocol is
> actually the output parameters, which the JDBC API returns separately
> from a ResultSet, so the initial CallableStatement.execute() call will
> need to process the primary result set and then send NextResult and
> obtain the first dynamic result as the first ResultSet for its API,
> but that can be handled internally.
>
> Thoughts so far?
>

Exciting stuff. But I'm a bit concerned about the sequence of
resultsets. The JDBC docco for CallableStatement says:

A CallableStatement can return one ResultSet object or multiple
ResultSet objects. Multiple ResultSet objects are handled using
operations inherited from Statement.

For maximum portability, a call's ResultSet objects and update
counts should be processed prior to getting the values of output
parameters.

And this is more or less in line with the pattern that I've seen when
converting SPs from other systems - the OUT params are usually set at
the end with things like status flags and error messages.

If the OUT parameter resultset has to come first (which is how I read
your proposal - please correct me if I'm wrong) we'll have to stack up
all the resultsets until the SP returns, then send the OUT params, then
send the remaining resultsets. That seems ... suboptimal.  The
alternative would be to send the OUT params last. That might result in
the driver needing to do some lookahead and caching, but I don't think
it's unmanageable. Of course, your protocol would also need changing.

cheers

andrew

--
Andrew Dunstan
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2020-10-09 18:08:42 Re: Batching page logging during B-tree build
Previous Message Tomas Vondra 2020-10-09 17:05:02 Re: Possible NULL dereferencing null pointer (src/backend/executor/nodeIncrementalSort.c)