Quick Links

Index on (fixed size) bytea value

From:	Les <nagylzs(at)gmail(dot)com>
To:	pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Index on (fixed size) bytea value
Date:	2023-06-19 20:05:33
Message-ID:	CAKXe9UAhXAZbs7jys68cC9rsKPM+YBK7pipps7Ww-f+dJW5bhg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Dear fellow list members,

I'm in the process of implementing a file storage system that is based on
PostgreSQL and streaming replication. There will possibly be many similar
files stored. I would like to implement block-level deduplication: each
file consists of a series of blocks, and each unique block is stored only
once (e.g. one block may be reused by multiple files). It will be part of a
bigger software, e.g. the same database will be used for other purposes too.

Here is the basic idea for storing individual blocks:

create table block(
id uuid not null primary key,
block bytea not null,
hs256 bytea not null
) ;
create unique index uidx_block_hs256 on block(hs256);

create or replace function trg_biu_block() returns trigger language plpgsql
as
$function$
begin
new.hs256 = digest(new.block, 'sha256');
end;
$function$;

create trigger trg_biu_block before insert or update on block for each row
execute procedure trg_biu_block();

This is just for storing the blocks. I'm going to put this "block" table
into a separate tablespace. File operations will be at least 95% read and
at most 5% write. (Streaming replication will hopefully allow almost
horizontal scaling for read operations.) Most of the files will be several
megabytes in size (images), and some of them will be 100MB or more
(videos). Total capacity is in the 10TB range. Storage will be SSD
(possibly fiber connected, or local RAID, we are not sure yet).

I do not want to use PostgreSQL large objects, because it does not have
block level deduplication.

Here are some things that I need help with:

1. What should be the maximum size of a block? I was trying to find out the
optimal value. Default BLCKSZ is 8192 bytes. AFAIK PostgreSQL does not
allow a row to occupy multiple blocks. I don't know enough to calculate the
optimal "block size" (the max. number of bytes stored in a single row in
the block.block field), but I suspect that it should be 4K or something
similar. I think that it should be as large as possible, without hitting
the toast. My thinking is this: most files will be at least 1MB in size, so
most "block" rows will reach the maximum tuple size. It would be practical
to make one row in the "block" table occupy almost one PostgreSQL block.
2. I'm not sure if it would be beneficial to increase BLCKSZ. I will be
able to test the speed of my (not yet finished) implementation with
different BLCKSZ values, but that alone won't help me make the decision,
because AFAIK BLCKSZ must have a fixed value for the PostgreSQL instance,
so it will affect all other tables in the database. It would be hard to
tell how changing BLCKSZ would affect the system as a whole.
3. In the above example, I used SHA-256 (pgcrypto), because apparently it
is very well optimized for 64 bit machines, and it has practically zero
chance of a collision. I think that sha512 would be an overkill. But I'm
not sure that this is the best choice. Maybe somebody with more experience
can suggest a better hash function.
4. The hs256 value will always be non-null, fixed 32 byte binary value, but
probably the query planner will not know anything about that. I was also
thinking about bit(256), but I don't see an easy way to convert the bytea
digest into bit(256). A simple type cast won't work here. Maybe using bytea
here is perfectly fine, and creating an index on the hs256 bytea fields is
as effective as possible.

I'm not looking for a definitive answer, just trying to get some hints from
more experienced users before I fill up the drives with terabytes of data.

Thank you,

Laszlo

Responses

Re: Index on (fixed size) bytea value at 2023-06-19 20:30:32 from David G. Johnston

Browse pgsql-performance by date

	From	Date	Subject
Next Message	David G. Johnston	2023-06-19 20:30:32	Re: Index on (fixed size) bytea value
Previous Message	Benoit Tigeot	2023-06-19 16:30:12	Re: Forced to use UNION ALL when having multiple ANY operators and ORDER BY LIMIT