Re: dynamic result sets support in extended query protocol

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: peter(dot)eisentraut(at)2ndquadrant(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: dynamic result sets support in extended query protocol
Date: 2020-10-08 08:23:54
Message-ID: 20201008.172354.2074080435553573128.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Are you proposing to bump up the protocol version (either major or
minor)? I am asking because it seems you are going to introduce some
new message types.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> I want to progress work on stored procedures returning multiple result
> sets. Examples of how this could work on the SQL side have previously
> been shown [0]. We also have ongoing work to make psql show multiple
> result sets [1]. This appears to work fine in the simple query
> protocol. But the extended query protocol doesn't support multiple
> result sets at the moment [2]. This would be desirable to be able to
> use parameter binding, and also since one of the higher-level goals
> would be to support the use case of stored procedures returning
> multiple result sets via JDBC.
>
> [0]:
> https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com
> [1]: https://commitfest.postgresql.org/29/2096/
> [2]:
> https://www.postgresql.org/message-id/9507.1534370765%40sss.pgh.pa.us
>
> (Terminology: I'm calling this project "dynamic result sets", which
> includes several concepts: 1) multiple result sets, 2) those result
> sets can have different structures, 3) the structure of the result
> sets is decided at run time, not declared in the schema/procedure
> definition/etc.)
>
> One possibility I rejected was to invent a third query protocol beside
> the simple and extended one. This wouldn't really match with the
> requirements of JDBC and similar APIs because the APIs for sending
> queries don't indicate whether dynamic result sets are expected or
> required, you only indicate that later by how you process the result
> sets. So we really need to use the existing ways of sending off the
> queries. Also, avoiding a third query protocol is probably desirable
> in general to avoid extra code and APIs.
>
> So here is my sketch on how this functionality could be woven into the
> extended query protocol. I'll go through how the existing protocol
> exchange works and then point out the additions that I have in mind.
>
> These additions could be enabled by a _pq_ startup parameter sent by
> the client. Alternatively, it might also work without that because
> the client would just reject protocol messages it doesn't understand,
> but that's probably less desirable behavior.
>
> So here is how it goes:
>
> C: Parse
> S: ParseComplete
>
> At this point, the server would know whether the statement it has
> parsed can produce dynamic result sets. For a stored procedure, this
> would be declared with the procedure definition, so when the CALL
> statement is parsed, this can be noticed. I don't actually plan any
> other cases, but for the sake of discussion, perhaps some variant of
> EXPLAIN could also return multiple result sets, and that could also be
> detected from parsing the EXPLAIN invocation.
>
> At this point a client would usually do
>
> C: Describe (statement)
> S: ParameterDescription
> S: RowDescription
>
> New would be that the server would now also respond with a new
> message, say,
>
> S: DynamicResultInfo
>
> that indicates that dynamic result sets will follow later. The
> message would otherwise be empty. (We could perhaps include the
> number of result sets, but this might not actually be useful, and
> perhaps it's better not to spent effort on counting things that don't
> need to be counted.)
>
> (If we don't guard this by a _pq_ startup parameter from the client,
> an old client would now error out because of an unexpected protocol
> message.)
>
> Now the normal bind and execute sequence follows:
>
> C: Bind
> S: BindComplete
> (C: Describe (portal))
> (S: RowDescription)
> C: Execute
> S: ... (DataRows)
> S: CommandComplete
>
> In the case of a CALL with output parameters, this "primary" result
> set contains one row with the output parameters (existing behavior).
>
> Now, if the client has seen DynamicResultInfo earlier, it should now
> go into a new subsequence to get the remaining result sets, like this
> (naming obviously to be refined):
>
> C: NextResult
> S: NextResultReady
> C: Describe (portal)
> S: RowDescription
> C: Execute
> ....
> S: CommandComplete
> C: NextResult
> ...
> C: NextResult
> S: NoNextResult
> C: Sync
> S: ReadyForQuery
>
> I think this would all have to use the unnamed portal, but perhaps
> there could be other uses with named portals. Some details to be
> worked out.
>
> One could perhaps also do without the DynamicResultInfo message and
> just put extra information into the CommandComplete message indicating
> "there are more result sets after this one".
>
> (Following the model from the simple query protocol, CommandComplete
> really means one result set complete, not the whole top-level
> command. ReadyForQuery means the whole command is complete. This is
> perhaps debatable, and interesting questions could also arise when
> considering what should happen in the simple query protocol when a
> query string consists of multiple commands each returning multiple
> result sets. But it doesn't really seem sensible to cater to that.)
>
> One thing that's missing in this sequence is a way to specify the
> desired output format (text/binary) for each result set. This could
> be added to the NextResult message, but at that point the client
> doesn't yet know the number of columns in the result set, so we could
> only do it globally. Then again, since the result sets are dynamic,
> it's less likely that a client would be coded to set per-column output
> codes. Then again, I would hate to bake such a restriction into the
> protocol, because some is going to try. (I suspect what would be more
> useful in practice is to designate output formats per data type.) So
> if we wanted to have this fully featured, it might have to look
> something like this:
>
> C: NextResult
> S: NextResultReady
> C: Describe (dynamic) (new message subkind)
> S: RowDescription
> C: Bind (zero parameters, optionally format codes)
> S: BindComplete
> C: Describe (portal)
> S: RowDescription
> C: Execute
> ...
>
> While this looks more complicated, client libraries could reuse
> existing code that starts processing with a Bind message and continues
> to CommandComplete, and then just loops back around.
>
> The mapping of this to libpq in a simple case could look like this:
>
> PQsendQueryParams(conn, "CALL ...", ...);
> PQgetResult(...); // gets output parameters
> PQnextResult(...); // new: sends NextResult+Bind
> PQgetResult(...); // and repeat
>
> Again, it's not clear here how to declare the result column output
> formats. Since libpq doesn't appear to expose the Bind message
> separately, I'm not sure what to do here.
>
> In JDBC, the NextResult message would correspond to the
> Statement.getMoreResults() method. It will need a bit of conceptual
> adjustment because the first result set sent on the protocol is
> actually the output parameters, which the JDBC API returns separately
> from a ResultSet, so the initial CallableStatement.execute() call will
> need to process the primary result set and then send NextResult and
> obtain the first dynamic result as the first ResultSet for its API,
> but that can be handled internally.
>
> Thoughts so far?
>
> --
> Peter Eisentraut http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-10-08 08:24:52 Re: Resetting spilled txn statistics in pg_stat_replication
Previous Message Bharath Rupireddy 2020-10-08 08:19:32 Re: Parallel INSERT (INTO ... SELECT ...)