From: | Chris Bitmead <chrisb(at)nimrod(dot)itg(dot)telstra(dot)com(dot)au> |
---|---|
To: | Postgres Hackers List <hackers(at)postgreSQL(dot)org> |
Subject: | Re: [HACKERS] libpq |
Date: | 2000-02-17 06:28:13 |
Message-ID: | 38AB94FD.EFF0B473@nimrod.itg.telecom.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I posted this about a week ago, and it passed without comment.
Does this mean I'm so far off track that no-one cares to comment,
or I got it so right that no comment was needed?
Quick summary: I want to work on libpq, partly to implement
my OO plans in libpq, and partly to implement the streaming
interface. But I'm concerned that a lower-level interface
will give better control and better efficiency.
Also, this is a fair amount of hacking. I have heard talk of
"when we go to using corba" and such. I could look at doing
this at the same time, but remain to be convinced of the benefit.
What would be the method? something like sequence<Attribute> ?
I would have thought this would be a big protocol overhead. I
also would have thought that the db protocol for a database
would be sufficiently simple and static that corba would be
overkill. Am I wrong?
Chris Bitmead wrote:
>
> 100 bytes, or even 50 bytes seems like a huge price to pay. If I'm
> retrieving 10 byte tuples that's a 500% or 1000% overhead.
>
> There are other issues too. Like if I want to be able to populate
> a C++ object without the overhead of copying, I need to know
> in advance the type of tuple I'm getting back. So I need something
> like a nextClass() API.
>
> Here is what I'm imagining (in very rough terms with details glossed
> over).
> How would you do this with the PGresult idea?...
>
> class Base {
> int c;
> }
> class Sub1 : Base {
> int b;
> }
> class Sub2 : Base {
> int c;
> }
> #define OFFSET (class, field) (&((class *)NULL)->field)
> struct FieldPositions f1[] = { { "a", OFFSET(Sub1,a) }, { "b",
> OFFSET(Sub1,b)} };
> struct FieldPositions f2[] = { { "a", OFFSET(Sub1, c) }, { "c",
> OFFSET(Sub2, c) } };
>
> PGresult *q = PQexecStream("SELECT ** from Base");
> List<Base> results;
> for (;;) {
> PGClass *class = PQnextClass(q);
> if (PQresultStatus(q) == ERROR)
> processError(q);
> else if (PQresultStatus(q) == NO_MORE)
> break;
> if (strcmp(class->name) == "Sub1") {
> results.add(PQnextObject(q, new Sub1, FieldPositions(f1)));
> else if (strcmp(class->name) == "Sub2") {
> results.add(PQnextObject(q, new Sub2, FieldPositions(f2)));
> }
>
> Of course in a full ODBMS front end, some of the above code would
> be generated or something.
>
> In this case PQnextObject is populating memory supplied by the
> programmer.
> There is no overhead whatsoever, nor can there be because we are
> supplying
> memory for the fields we care about.
>
> In this case we don't even need to store tuple descriptors because
> the C++ object has it's vtbl which is enough. If we cared about
> tuple descriptors though we could hang onto the PGClass and do
> something like PQgetValue(class, object, "fieldname"), which
> would be useful for some language interfaces no doubt.
>
> A basic C example would look like this...
>
> PGresult *q = PQexecStream("SELECT ** from Base");
> for (;;) {
> PGClass *class = PQnextClass(q);
> if (PQresultStatus(q) == ERROR)
> processError(q);
> else if (PQresultStatus(q) == NO_MORE)
> break;
> PGobject *obj = PQnextObject(q, NULL, NULL);
> for (int c = 0; c < PQnColumns(class); c++) {
> printf("%s: %s, ", PQcolumnName(class, c), PQcolumnValue(class, c,
> obj));
> printf("\n");
> }
>
> The points to note here are:
> (1) Yes, the error message stuff comes from PGresult as it does now.
> (2) You don't have a wasteful new PGresult for every time you get
> the next result.
> (3) You are certainly not required to store a whole lot of PGresults
> just because you want to cache tuples.
> (4) Because the tuple descriptor is explicit (PGClass*) you can
> keep it or not as you please. If you are doing pure relational
> with fixed number of columns, there is ZERO overhead per tuple
> because you only need keep one pointer to the PGClass. This is
> even though you retrieve results one at a time.
> (5) Because of (4) I can't see the need for any API to support
> getting multiple tuples at a time since it is trivially implemented
> in terms of nextObject with no overhead.
>
> While a PGresult interface like you described could be built, I can't
> see that
> it fulfills all the requirements that I would have. It could be
> trivially
> built on top of the above building blocks, but it doesn't sound fine
> enough
> grained for me. If you disagree, tell me how you'd do it.
>
> Tom Lane wrote:
> >
> > Chris <chris(at)bitmead(dot)com> writes:
> > > All I mean to say is that it is often desirable to have control over
> > > when each individual object is destroyed, rather than having to destroy
> > > each batch at once.
> >
> > Right, so if you really want to destroy retrieved tuples one at a time,
> > you request only one per retrieved PGresult. I claim that the other
> > case where you want them in small batches (but not necessarily only one
> > at a time) is at least as interesting; therefore the mechanism should
> > not be limited to the exactly-one-at-a-time case. Once you allow for
> > the other requirements, you have something that looks enough like a
> > PGresult that it might as well just *be* a PGresult.
> >
> > > The result status and query status is only temporarily interesting. Once
> > > I know the tuple arrived safely I don't care much about the state of
> > > affairs at that moment, and don't care to waste memory on a structure
> > > that has space for all these error fields.
> >
> > Let's see (examines PGresult declaration). Four bytes for the
> > resultStatus, four for the errMsg pointer, 40 for cmdStatus,
> > out of a struct that is going to occupy close to 100 bytes on
> > typical hardware --- and that's not counting the tuple descriptor
> > data and the tuple(s) proper. You could easily reduce the cmdStatus
> > overhead by making it a pointer to an allocated string instead of
> > an in-line array, if the 40 bytes were really bothering you. So the
> > above seems a pretty weak argument for introducing a whole new datatype
> > and a whole new set of access functions for it. Besides which, you
> > haven't explained how it is that you are going to avoid the need to
> > be able to represent error status in a PGObject. The function that
> > fetches the next tuple(s) in a query has to be able to return an
> > error status, and that has to be distinguishable from "successful
> > end of query" and from "no more data available yet".
> >
> > > The other thing about PGobject idea is that when I do a real OO database
> > > idea, is that getNextObject will optionally populate user-supplied data
> > > instead.
> >
> > And that can't be done from a PGresult because?
> >
> > So far, the *only* valid reason you've given for inventing a new
> > datatype, rather than just using PGresult for the purpose, is to save a
> > few bytes by eliminating unnecessary fields. That seems a pretty weak
> > argument (even assuming that the fields are unnecessary, which I doubt).
> > Having to support and document a whole set of essentially-identical
> > access functions for both PGresult and PGObject is the overhead that
> > we ought to be worried about, ISTM. Don't forget that it's not just
> > libpq we are talking about, either; this additional API will also have
> > to propagate into libpq++, libpgtcl, the perl5 and python modules,
> > etc etc etc.
> >
> > regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2000-02-17 06:29:40 | Re: [HACKERS] Date/time types: big change |
Previous Message | Thomas Lockhart | 2000-02-17 06:14:23 | Re: [HACKERS] Date/time types: big changeu |