From: | Hannu Krosing <hannu(at)tm(dot)ee> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Barry Lind <barry(at)xythos(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: OCTET_LENGTH is wrong |
Date: | 2001-11-19 20:40:35 |
Message-ID: | 3BF96E43.20907@tm.ee |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane wrote:
>Barry Lind <barry(at)xythos(dot)com> writes:
>
>>While the text datatypes have additional issues with encodings, that is
>>not true for the bytea type. I think it does make sense that a client
>>be able to get the size in bytes that the bytea type value will return
>>to the client.
>>
>
>bytea does that already. It's only text that has (or had, till a few
>minutes ago) the funny behavior.
>
>I'm not set on the notion that octet_length should return on-disk size;
>that's clearly not what's contemplated by SQL92, so I'm happy to agree
>that if we want that we should add a new function to get it.
>("storage_length", maybe.) What's bothering me right now is the
>difference between client and server encodings. It seems that the only
>plausible use for octet_length is to do memory allocation on the client
>side,
>
Allocating memory seems for me to be drivers (libpq, JDBC, ODBC,...)
problem and
not something to be done by client code beforehand - at least for libpq
(AFAIK) we
don't have any means of giving it a pre-allocated storage area for one
field.
There is enough information in wire protocol for allocating right-sized
chunks at the
time query result is read. An additional call of "SELECT
OCTET_LENGTH(someCol)"
seems orders of magnitude slower than doing it at the right time in the
driver .
>and for that purpose the length ought to be measured in the client
>encoding. People seem to be happy with letting octet_length take the
>easy way out (measure in the server encoding), and I'm trying to get
>someone to explain to me why that's the right behavior. I don't see it.
>
perhaps we need another function "OCTET_LENGTH(someCol, encoding)" for
getting what we want and also client_encoding() and server_encoding()
for supplying
it some universal defaults ?
OTOH, from reading on Unicode I've came to a conlusion that there are
often several
ways for expressing the same string in Unicode, so for server encoding
not unicode and
client requesting unicode (say UTF-8) there can be several different
ways to express
the same string. Thus there is no absolute OCTET_LENGTH for
client_encoding for
all cases. Thus giving the actual uncompressed length seems most reasonable.
For unicode both in backend and frontend we could also make OCTET_LENGTH
return not int but an integer-interval of shortest and longest possible
encoding ;)
------------------
Hannu
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2001-11-19 21:04:10 | Re: postgresql.conf |
Previous Message | Bruce Momjian | 2001-11-19 20:07:22 | Re: Further open item (Was: Status of 7.2) |