Re: More data files / forks

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Chris Cleveland <ccleve+github(at)dieselpoint(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: More data files / forks
Date: 2022-01-12 00:28:37
Message-ID: 88723fed-9f4a-cf2b-0786-382e70cea7f1@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 1/11/22 19:39, Chris Cleveland wrote:
> I'm working on a table access method that stores indexes in a structure
> that looks like an LSM tree. Changes get written to small segment files,
> which then get merged into larger segment files.
>
> It's really tough to manage these files using existing fork/buffer/page
> files, because when you delete a large segment it leaves a lot of empty
> space. It's a lot easier to write the segments into separate files on
> disk and then delete them as needed.
>

And is that empty space actually a problem? You can reuse that for new
data, no? It's a bit like empty space in regular data files - we could
try keeping it much lower, but it'd be harmful in practice.

> I could do that, but then I lose the advantages of having data in native
> Postgres files, including support for buffering and locking.
>
> It's important to have the segments stored contiguously on disk. I've
> benchmarked it; it makes a huge performance difference.
>

Yeah, I'm sure it's beneficial for sequential scans, readahead, etc. But
you can get most of that benefit by smart allocation strategy - instead
of working with individual pages, allocate larger chunks of pages. So
instead of grabbing pages one by one, "reserve" them in e.g. 1MB chunks,
or something.

Not sure how exactly you do the book-keeping, ofc. I wonder if BRIN
might serve as an inspiration, as it maintains revmap and actual index
tuples in the same fork. Not the same thing, but perhaps similar?

The other thing that comes to mind is logtape.c, which works with
multiple "logical tapes" stored in a single file - a bit like the
segments you're talking about. But maybe the assumptions about segments
being written/read exactly once is too limiting for your use case.

> Questions:
>
> 1. Are there any other disadvantages to storing data in my own files on
> disk, instead of in files managed by Postgres?
>

Well, you simply don't get many of the built-in benefits you mentioned,
various tools may not expect that, and so on.

> 2. Is it possible to increase the number of forks? I could store each
> level of the LSM tree in its own fork very efficiently. Forks could get
> truncated as needed. A dozen forks would handle it nicely.
>

You're right the number of forks is fixed, and it's one of the places
that's not extensible. I don't recall any proposals to change that,
though, and even if we decided to do that, I doubt we'd allow the number
of forks to be entirely dynamic.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-01-12 00:39:28 Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings
Previous Message Masahiko Sawada 2022-01-12 00:19:27 Re: Skipping logical replication transactions on subscriber side