C on Client versus C on Server

From: Eliot Gable <egable+pgsql-performance(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: C on Client versus C on Server
Date: 2011-04-02 18:52:28
Message-ID: BANLkTi=2EpfSK94LK=2hWSVxBCjV3hfacQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a PL/PGSQL stored procedure on my server which is about 3,500 lines
long. To further improve performance after more than year of optimizing the
PL/PGSQL, and also to hide the logic so it is not as easy to copy, I am
starting to re-write the stored procedure in C. The initial design is such
that a C++ client would query the PL/PGSQL stored procedure using one
transaction and the result of the transaction would be several open cursors
it could use to pull down the data from the server. This data would be
re-written to XML and passed back to the original requestor for use. I am
now also going to replace the XML conversion process and re-write the C++
logic as a module for the system that initiates the XML request. So, the
layout is going from:

User Client ----> XML ----> C++ Server / PG Client -----> SQL ----->
PL/PGSQL Stored Proc on PGSQL Server

To:

User Client / C Module / PG Client -----> SQL ----> C Stored Proc on PGSQL
Server

This will improve overall efficiency considerably. However, I am now wonder
about the follow....

The PL/PGSQL procedure has many large SELECT statements which pull various
data from various tables. Then, between those SELECT statements, it runs
logic to determine what other SELECT statements it should execute. Once it
figures this all out and finishes running all the select statements, it
joins some of those results together and returns 3 cursors which represent
all the data the User Client needs in order to proceed with processing its
workload. Every SELECT statement that the procedure ends up executing has
its data stored in those three cursors, and all of it is required by the
User Client to process its workload. Given that this data must be
transferred across the link from the Postgres server to the client, would it
be more efficient to keep all of the contents of the PL/PGSQL stored
procedure, including the logic about which SELECT statements to run, on the
Postgres server, or would it be just as fast to have that logic contained in
the C module for the User Client and have that C module make multiple
requests to the Postgres server?

The stored procedure also does write some records to three tables on the
server which represent a summary of everything that it computed and returned
to the User Client for billing purposes. My initial intuition on this is
that keeping it all in the stored procedure on the server is going to be
faster for two main reasons:

1) Each select statement from the User Client C Module would be a separate
transaction which would drastically increase transaction overhead for the
whole set of requests.
2) Writing the billing data at the end would mean that I not only have to
pull all the data down to the User Client, I must also push the data back up
to the server for writing the billing records.

So, am I looking at this the right way, or am I missing something?

Thanks in advance for any responses.

--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our
children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from
our children, we're stealing from them--and it's not even considered to be a
crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not
live to eat.) ~Marcus Tullius Cicero

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Heikki Linnakangas 2011-04-03 07:01:57 Re: C on Client versus C on Server
Previous Message Josh Berkus 2011-04-01 17:32:24 Re: table contraints checks only happen in planner phase