Quick Links

Column lookup in a row performance

From:	Павлухин Иван <vololo100(at)gmail(dot)com>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Column lookup in a row performance
Date:	2019-03-30 06:35:39
Message-ID:	CAOykqKfko-n5YiBJtk-ocVdp+j92Apu5MJBwbGGh4awRY5NCuQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi PostgresSQL developers,

I asked my question already on pgsql-general list and did not find an
explanation. Below is the question mainly copied from [0].
----

I am learning deeply how tuples are organized and column values are
accessed in different databases. As far as undertood postgres does not
store all column positions in a tuple (e.g. in header or footer). In
contrast MySQL InnoDB stores column lengths in a record header [1].
From the first glance it seems that a postgres format can have a
significant performance penalty when accessing a single column which
is located after multiple variable-length columns because searching a
column value position in a row requires multiple jumps. And in InnoDB
a position of a particular column can be found right after reading a
header.

I found several related threads in pgsql-hackers archives [2,3]
describing significant performance wins in a prototype.

Does anyone know why the format is still the same? Perhaps InnoDB and
similar formats are not so good, are they?

Please respond if you have the clue!
----

I did a rough experiment to check if a difference is visible. I used
following table (200 pairs of columns):
create table layout(
i0 int,
s0 varchar(255),
...
i199 int,
s199 varchar(255)
);

And populated it with 1 million rows. And run following queries
SELECT AVG(i0) FROM layout;
SELECT AVG(i199) FROM layout;

On my machine calculating an average over column i0 took about 1
second and about 2.5 seconds for column i199. And similar observations
were described in threads mentioned before. Quite significant
difference!

I made a similar experiment for mysql as well (innodb). And results
are the same for first and last columns.

Some details how it is stored in innodb. They store varlen column
lengths only in a tuple header (there is no length prefix before
column data itself). Having a tuple descriptor and lengths in a tuple
header it is always possible to calculate each column position without
jumping through an entire record. And seems that space requirements
are same as in postgresql.

It seems that an innodb layout is better at least for reading. So, it
is still unclear for me why postgresql does not employ similar layout
if it can give significant benefits.
----

[0] https://www.postgresql.org/message-id/flat/CAOykqKc8Uoi3NKVfd5DpTmUzD4rJBWG9Gjo3pr7eaUGLtrstvw%40mail.gmail.com
[1] https://dev.mysql.com/doc/refman/8.0/en/innodb-row-format.html#innodb-row-format-compact
[2] https://www.postgresql.org/message-id/flat/c58979e50702201307w64b12892uf8dfc3d8bf117ec0%40mail.gmail.com
[3] https://www.postgresql.org/message-id/flat/87irj16umm.fsf%40enterprisedb.com

--
Best regards,
Ivan Pavlukhin

Responses

Re: Column lookup in a row performance at 2019-03-30 14:26:06 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peifeng Qiu	2019-03-30 06:42:39	Speed up build on Windows by generating symbol definition in batch
Previous Message	高增琦	2019-03-30 06:22:59	Indexscan failed assert caused by using index without lock