Re: Column lookup in a row performance

From: Павлухин Иван <vololo100(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Column lookup in a row performance
Date: 2019-03-26 15:16:21
Message-ID: CAOykqKefnGPySQDKdwvk8g_bM3WnQOXMrf33Nv-oW0hYn3T3eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi David,

Thank you for your response. I understand that postgresql behaves good
for fixed-length NOT NULL columns stored in the beginning of a tuple.
But let's imagine a use case when we just have many NOT NULL
variable-length columns. With current storage format accessing a first
column is faster than accessing a last one.

I did a rough experiment to check if a difference is visible. I used
following table (200 pairs of columns):
create table layout(
i0 int,
s0 varchar(255),
...
i199 int,
s199 varchar(255)
);

And populated it with 1 million rows. And run following queries
SELECT AVG(i0) FROM layout;
SELECT AVG(i199) FROM layout;

On my machine calculating an average over column i0 took about 1
second and about 2.5 seconds for column i199. And similar observations
were described in threads mentioned before. Quite significant
difference!

I made a similar experiment for mysql as well (innodb). And results
are the same for first and last columns.

Some details how it is stored in innodb. They store varlen column
lengths only in a tuple header (there is no length prefix before
column data itself). Having a tuple descriptor and lengths in a tuple
header it is always possible to calculate each column position without
jumping through an entire record. And seems that space requirements
are same as in postgresql.

It seems that an innodb layout is better at least for reading. So, it
is still unclear for me why postgresql does not employ similar layout
if it can give significant benefits.

пт, 22 мар. 2019 г. в 13:15, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>:
>
> On Fri, 22 Mar 2019 at 19:13, Павлухин Иван <vololo100(at)gmail(dot)com> wrote:
> > I am learning deeply how tuples are organized and column values are
> > accessed in different databases. As far as undertood postgres does not
> > store all column positions in a tuple (e.g. in header or footer). In
> > contrast MySQL InnoDB stores column lengths in a record header [1].
> > From the first glance it seems that a postgres format can have a
> > significant performance penalty when accessing a single column which
> > is located after multiple variable-length columns because searching a
> > column value position in a row requires multiple jumps. And in InnoDB
> > a position of a particular column can be found right after reading a
> > header.
>
> When the tuple contains no NULLs, PostgreSQL does cache the offsets to
> the attribute position in the tuple up until the first variable length
> field. This allows code to directly access the value without having to
> deform all fields that come before the required field, and since the
> offset value is the same for all tuples of this type, then it can be
> stored just once, in what we call the tuple descriptor.
>
> I'm not aware of what innodb does, but if it stores actual offsets
> fields that come after a variable length field, then that sounds like
> something that needs to be stored per-tuple, so there's a trade-off;
> speed vs storage space. In PostgreSQL, one way you can obtain faster
> access is to not have NULLs and put fixed width fields first. That's,
> of course, not always possible, but useful to keep in mind when
> deciding the order to have your columns in the table.
>
> --
> David Rowley http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

--
Best regards,
Ivan Pavlukhin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rob Sargent 2019-03-26 15:50:37 Re: stale WAL files?
Previous Message Frank 2019-03-26 14:37:38 Re: When to store data that could be derived