Quick Links

Re: [HACKERS] libpq

From:	Chris Bitmead <chrisb(at)nimrod(dot)itg(dot)telstra(dot)com(dot)au>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Chris <chris(at)bitmead(dot)com>, Postgres Hackers List <hackers(at)postgreSQL(dot)org>
Subject:	Re: [HACKERS] libpq
Date:	2000-02-14 00:24:35
Message-ID:	38A74B43.2F753D32@nimrod.itg.telecom.com.au
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

100 bytes, or even 50 bytes seems like a huge price to pay. If I'm
retrieving 10 byte tuples that's a 500% or 1000% overhead.

There are other issues too. Like if I want to be able to populate
a C++ object without the overhead of copying, I need to know
in advance the type of tuple I'm getting back. So I need something
like a nextClass() API.

Here is what I'm imagining (in very rough terms with details glossed
over).
How would you do this with the PGresult idea?...

class Base {
int c;
}
class Sub1 : Base {
int b;
}
class Sub2 : Base {
int c;
}
#define OFFSET (class, field) (&((class *)NULL)->field)
struct FieldPositions f1[] = { { "a", OFFSET(Sub1,a) }, { "b",
OFFSET(Sub1,b)} };
struct FieldPositions f2[] = { { "a", OFFSET(Sub1, c) }, { "c",
OFFSET(Sub2, c) } };

PGresult *q = PQexecStream("SELECT ** from Base");
List<Base> results;
for (;;) {
PGClass *class = PQnextClass(q);
if (PQresultStatus(q) == ERROR)
processError(q);
else if (PQresultStatus(q) == NO_MORE)
break;
if (strcmp(class->name) == "Sub1") {
results.add(PQnextObject(q, new Sub1, FieldPositions(f1)));
else if (strcmp(class->name) == "Sub2") {
results.add(PQnextObject(q, new Sub2, FieldPositions(f2)));
}

Of course in a full ODBMS front end, some of the above code would
be generated or something.

In this case PQnextObject is populating memory supplied by the
programmer.
There is no overhead whatsoever, nor can there be because we are
supplying
memory for the fields we care about.

In this case we don't even need to store tuple descriptors because
the C++ object has it's vtbl which is enough. If we cared about
tuple descriptors though we could hang onto the PGClass and do
something like PQgetValue(class, object, "fieldname"), which
would be useful for some language interfaces no doubt.

A basic C example would look like this...

PGresult *q = PQexecStream("SELECT ** from Base");
for (;;) {
PGClass *class = PQnextClass(q);
if (PQresultStatus(q) == ERROR)
processError(q);
else if (PQresultStatus(q) == NO_MORE)
break;
PGobject *obj = PQnextObject(q, NULL, NULL);
for (int c = 0; c < PQnColumns(class); c++) {
printf("%s: %s, ", PQcolumnName(class, c), PQcolumnValue(class, c,
obj));
printf("\n");
}

The points to note here are:
(1) Yes, the error message stuff comes from PGresult as it does now.
(2) You don't have a wasteful new PGresult for every time you get
the next result.
(3) You are certainly not required to store a whole lot of PGresults
just because you want to cache tuples.
(4) Because the tuple descriptor is explicit (PGClass*) you can
keep it or not as you please. If you are doing pure relational
with fixed number of columns, there is ZERO overhead per tuple
because you only need keep one pointer to the PGClass. This is
even though you retrieve results one at a time.
(5) Because of (4) I can't see the need for any API to support
getting multiple tuples at a time since it is trivially implemented
in terms of nextObject with no overhead.

While a PGresult interface like you described could be built, I can't
see that
it fulfills all the requirements that I would have. It could be
trivially
built on top of the above building blocks, but it doesn't sound fine
enough
grained for me. If you disagree, tell me how you'd do it.

Tom Lane wrote:
>
> Chris <chris(at)bitmead(dot)com> writes:
> > All I mean to say is that it is often desirable to have control over
> > when each individual object is destroyed, rather than having to destroy
> > each batch at once.
>
> Right, so if you really want to destroy retrieved tuples one at a time,
> you request only one per retrieved PGresult. I claim that the other
> case where you want them in small batches (but not necessarily only one
> at a time) is at least as interesting; therefore the mechanism should
> not be limited to the exactly-one-at-a-time case. Once you allow for
> the other requirements, you have something that looks enough like a
> PGresult that it might as well just *be* a PGresult.
>
> > The result status and query status is only temporarily interesting. Once
> > I know the tuple arrived safely I don't care much about the state of
> > affairs at that moment, and don't care to waste memory on a structure
> > that has space for all these error fields.
>
> Let's see (examines PGresult declaration). Four bytes for the
> resultStatus, four for the errMsg pointer, 40 for cmdStatus,
> out of a struct that is going to occupy close to 100 bytes on
> typical hardware --- and that's not counting the tuple descriptor
> data and the tuple(s) proper. You could easily reduce the cmdStatus
> overhead by making it a pointer to an allocated string instead of
> an in-line array, if the 40 bytes were really bothering you. So the
> above seems a pretty weak argument for introducing a whole new datatype
> and a whole new set of access functions for it. Besides which, you
> haven't explained how it is that you are going to avoid the need to
> be able to represent error status in a PGObject. The function that
> fetches the next tuple(s) in a query has to be able to return an
> error status, and that has to be distinguishable from "successful
> end of query" and from "no more data available yet".
>
> > The other thing about PGobject idea is that when I do a real OO database
> > idea, is that getNextObject will optionally populate user-supplied data
> > instead.
>
> And that can't be done from a PGresult because?
>
> So far, the *only* valid reason you've given for inventing a new
> datatype, rather than just using PGresult for the purpose, is to save a
> few bytes by eliminating unnecessary fields. That seems a pretty weak
> argument (even assuming that the fields are unnecessary, which I doubt).
> Having to support and document a whole set of essentially-identical
> access functions for both PGresult and PGObject is the overhead that
> we ought to be worried about, ISTM. Don't forget that it's not just
> libpq we are talking about, either; this additional API will also have
> to propagate into libpq++, libpgtcl, the perl5 and python modules,
> etc etc etc.
>
> regards, tom lane

In response to

Re: [HACKERS] libpq at 2000-02-11 15:10:13 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Chairudin Sentosa	2000-02-14 02:47:34	Suggestion to split /data/base directory
Previous Message	Don Baccus	2000-02-14 00:06:13	Re: [HACKERS] Solution for LIMIT cost estimation