Re: byte-size of column values

From: Dominique Devienne <ddevienne(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: byte-size of column values
Date: 2022-10-19 08:21:02
Message-ID: CAFCRh-8vRSkJ9T1oTM6m9f9nPd9Af+r3j2a4VNT+kDBnryX5CA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Oct 18, 2022 at 6:04 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Dominique Devienne <ddevienne(at)gmail(dot)com> writes:
> > I'm surprised by the result for bit(3) and char
> > The doc does mention 5-8 bytes overhead, but I expected
> > those for varying bit, not fixed-sized bit typed values.
>
> Your expectation is incorrect. Postgres always treats these types
> as variable-length, whether or not the column has a length constraint.

OK. Still, wasn't such a stretch to assume that, no?
Now I know better, thanks to you and David.
I'm not sure the doc on types talks about that either.
Didn't see it for sure (but could still be there and I missed it).

> Thus, there's always a header to store the actual length. That can
> be either 1 or 4 bytes (I think the doc you are looking at might be
> a little out of date on that point).

Even the doc on v15 (or devel) still says 5-to-8.
https://www.postgresql.org/docs/15/datatype-bit.html

And on my v12, that's born out from my experimentation.
Being used to SQLite using varints,
I'd have expected fewer overhead bytes for the size, like your 1-to-4.

> Because of the popularity of variable-width character encodings,
> a column declared as N characters wide isn't necessarily a fixed
> number of bytes wide, making it a lot less useful than you might
> think to have optimizations for fixed-width storage. Between that
> and the fact that most Postgres developers regard CHAR(N) as an
> obsolete hangover from the days of punched cards, no such
> optimizations have been attempted.

Thanks for the background. I definitely appreciate PostgreSQL's large 1GB
limit on text and bytea columns, coming from Oracle's tiny 4K one, which
created us all kind of headaches.

For kicks, I looked at bpchar, blank-padded-char, and its extra byte, which
I assume is again some kind of length, there at least the overhead is small
compared to bit(n). 1 bytes versus 5 bytes is no small difference.

ddevienne=> create table bar (bpc bpchar(16));
CREATE TABLE
ddevienne=> insert into bar values ('foo'), ('bar baz');
INSERT 0 2
ddevienne=> select length(bpc), pg_column_size(bpc), '<'||bpc||'>' from bar;
length | pg_column_size | ?column?
--------+----------------+-----------
3 | 17 | <foo>
7 | 17 | <bar baz>
(2 rows)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dominique Devienne 2022-10-19 09:47:59 How to store "blobs" efficiently for small and large sizes, with random access
Previous Message Dominique Devienne 2022-10-19 08:03:23 Re: byte-size of column values