Unexpected table size usage for small composite arrays

From: Erik Sjoblom <sjoblom65(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Unexpected table size usage for small composite arrays
Date: 2024-10-22 21:58:29
Message-ID: CAAW=00XvzL9q6m2jgsd2YJJqzjjmBYyM3A1uaXuMMr19SpJ57A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello PostgreSQL developers,

I’m observing a storage behavior with arrays in a table that differs from
my expectations, and I’d appreciate your insights. I was to store key value
pairs in a very dense data model. I don't haver the requirement of search
so that's why I was thinking an array of a composite type would work well.
I can see that padding might be involved using the int4 and int8
combination but there is more overhead. Anyone know where the following it
coming from?
Context

I have defined a composite type and a table as follows:

sql
Copy code
CREATE TYPE property_entry_data_type AS (
property_key_id int4,
property_value_id int8
);
CREATE TABLE property_set_data
(
property_set_data_id int8 PRIMARY KEY,
parent_set_id int8 NULL,
owner_id uuid NOT NULL,
property_entry_data property_entry_data_type[] NULL,
created_at timestamptz DEFAULT now(),
modified_at timestamptz DEFAULT now()
);

Observations

I inserted 10,000 rows with varying numbers of elements in the
property_entry_data array. I noticed the following pattern:

- For a small number of elements (up to around 40), each array element
consumes roughly *40–50 bytes*.
- After reaching the *2 KB threshold*, PostgreSQL appears to start
compressing the array data within the main table.
- When the number of elements exceeds *220*, the data starts getting
stored in the TOAST table.

Expected Behavior

Based on my understanding, each element in the array should take *12-16
bytes* (4-8 bytes each for property_key_id depending on padding and
property_value_id). Including some additional overhead, I expected the
storage requirement to be roughly 24 + 12 * N bytes per row for N elements
in the array. However, the actual usage is significantly higher (~40–50
bytes per element), which I didn’t anticipate.
My Question

Could you please help me understand the following:

1. Why is PostgreSQL using *more than the expected 12 bytes per element*?
What factors contribute to the additional storage overhead?
2. How does PostgreSQL handle compression and alignment for arrays
stored in composite types, and could these factors explain the discrepancy?
3. Is there a way to *minimize the per-element overhead* for such arrays
in a table, or is this behavior expected given PostgreSQL’s internal
storage mechanisms?

This is the query I use to see the table size:

sql
Copy code
WITH table_info AS (
SELECT
n.nspname AS schema_name, -- Schema name
c.oid AS main_oid,
c.relname AS table_name,
c.relkind,
c.reltoastrelid AS toast_oid,
c.relispartition
FROM
pg_class c
JOIN
pg_namespace n ON n.oid = c.relnamespace -- Join to get
schema information
WHERE
c.relname = 'property_set_data' -- Replace with your table name
)
SELECT
ti.schema_name, -- Add schema to the output
ti.table_name,
CASE
WHEN ti.relispartition THEN 'Partitioned Table'
ELSE 'Regular Table'
END AS table_type,
pg_relation_size(ti.main_oid) AS main_table_size,
-- pg_size_pretty(pg_relation_size(ti.main_oid)) AS main_table_size,
pg_size_pretty(pg_indexes_size(ti.main_oid)) AS indexes_size,
CASE
WHEN ti.toast_oid = 0 THEN 'No TOAST table'
ELSE pg_total_relation_size(ti.toast_oid)::text END AS toast_size,
--ELSE pg_size_pretty(pg_total_relation_size(ti.toast_oid)) END
AS toast_size,
pg_size_pretty(pg_total_relation_size(ti.main_oid)) AS total_size
FROM
table_info ti;

Thank you for your assistance, and I appreciate any insights you can
provide!

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2024-10-22 22:27:31 Re: Statistics Import and Export
Previous Message Nathan Bossart 2024-10-22 19:54:57 use a non-locking initial test in TAS_SPIN on AArch64