Re: Extensible storage manager API - SMGR hook Redux

From: Xun Gong <gongxun0928(at)gmail(dot)com>
To: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
Cc: Tristan Partin <tristan(at)neon(dot)tech>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Extensible storage manager API - SMGR hook Redux
Date: 2024-12-10 06:49:53
Message-ID: CAPP=Hha_wV1MV9yR70QZ5pk5dtNP+bOyBiFxPmrMKqnQeKMAwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for your detailed review and insights. I share your view that a
registration-based approach for custom storage managers (smgr) is more
versatile. This method allows for the implementation of custom table access
methods, facilitating the use of various storage services (such as file
services or object storage), different file organization formats (files in
one directory or many sub directories), and flexible deletion logic
(direct deletion or mark-and-sweep).

While I acknowledge that the registration-based approach requires more
modifications, I believe the benefits in terms of extensibility and
functionality are significant. I have seen similar explorations in the
implementation of AO tables in Greenplum, where a dedicated smgr was
created due to its distinct file organization compared to heap tables.

I look forward to further discussions on this topic

Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com> 于2024年12月10日周二 11:25写道:

> Hi,
>
> I reviewed the discussion and took a look at the patch sets. It seems
> like many things are combined here. Based on the subject, I initially
> thought it aimed to provide the infrastructure to easily extend
> storage managers. This would allow anyone to create their own storage
> managers using this infrastructure. While it addresses this, it also
> includes additional features like fsync_checker, which I believe
> should be a separate feature. Even though it might use the same
> infrastructure, it appears to be a different functionality. I think we
> should focus solely on providing the infrastructure here.
>
> We need to decide on our approach—whether to use a hook-based method
> or a registration-based method—and I believe this requires further
> discussion.
>
> The hook-based approach is simple and works well for anyone writing
> their own storage manager. However, it has its limitations as we can
> either use the default storage manager or a custom-built one for all
> the work load, but we cannot choose between multiple storage managers.
> On the other hand, the registration-based approach allows choosing
> between multiple storage managers based on the workload, though it
> requires a lot of changes.
>
> Are we planning to support other storage managers in PostgreSQL in the
> near future? If not, it is better to go with the hook-based approach.
> Otherwise, the registration-based approach is preferable as it offers
> more flexibility to users and enhances PostgreSQL’s functionality.
>
> Could you please share your thoughts on this? Also, let me know if
> this topic has already been discussed and if any conclusions were
> reached.
>
> Best Regards,
> Nitin Jadhav
> Azure Database for PostgreSQL
> Microsoft
>
> On Sat, Jan 13, 2024 at 1:27 AM Tristan Partin <tristan(at)neon(dot)tech> wrote:
> >
> > Thought I would show off what is possible with this patchset.
> >
> > Heikki, a couple of months ago in our internal Slack, said:
> >
> > > [I would like] a debugging tool that checks that we're not missing any
> > > fsyncs. I bumped into a few missing fsync bugs with unlogged tables
> > > lately and a tool like that would've been very helpful.
> >
> > My task was to create such a tool, and off I went. I started with the
> > storage manager extension patch that Matthias sent to the list last
> > year[0].
> >
> > Andres, in another thread[1], said:
> >
> > > I've been thinking that we need a validation layer for fsyncs, it's
> too hard
> > > to get right without testing, and crash testing is not likel enough to
> catch
> > > problems quickly / resource intensive.
> > >
> > > My thought was that we could keep a shared hash table of all files
> created /
> > > dirtied at the fd layer, with the filename as key and the value the
> current
> > > LSN. We'd delete files from it when they're fsynced. At checkpoints we
> go
> > > through the list and see if there's any files from before the redo
> that aren't
> > > yet fsynced. All of this in assert builds only, of course.
> >
> > I took this idea and ran with it. I call it the fsync_checker™️. It is an
> > extension that prints relations that haven't been fsynced prior to
> > a CHECKPOINT. Note that this idea doesn't work in practice because
> > relations might not be fsynced, but they might be WAL-logged, like in
> > the case of createdb. See log_smgrcreate(). I can't think of an easy way
> > to solve this problem looking at the codebase as it stands.
> >
> > Here is a description of the patches:
> >
> > 0001:
> >
> > This is essentially just the patch that Matthias posted earlier, but
> > rebased and adjusted a little bit so storage managers can "inherit" from
> > other storage managers.
> >
> > 0002:
> >
> > This is an extension of 0001, which allows for extensions to set
> > a global storage manager. This is pretty hacky, and if it was going to
> > be pulled into mainline, it would need some better protection. For
> > instance, only one extension should be able to set the global storage
> > manager. We wouldn't want extensions stepping over each other, etc.
> >
> > 0003:
> >
> > Adds a hook for extensions to inspect a checkpoint before it actually
> > occurs. The purpose for the fsync_checker is so that I can iterate over
> > all the relations the extension tracks to find files that haven't been
> > synced prior to the completion of the checkpoint.
> >
> > 0004:
> >
> > This is the actual fsync_checker extension itself. It must be preloaded.
> >
> > Hopefully this is a good illustration of how the initial patch could be
> > used, even though it isn't perfect.
> >
> > [0]:
> https://www.postgresql.org/message-id/CAEze2WgMySu2suO_TLvFyGY3URa4mAx22WeoEicnK%3DPCNWEMrA%40mail.gmail.com
> > [1]:
> https://www.postgresql.org/message-id/20220127182838.ba3434dp2pe5vcia%40alap3.anarazel.de
> >
> > --
> > Tristan Partin
> > Neon (https://neon.tech)
>
>
>
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-12-10 06:50:43 Re: Pgoutput not capturing the generated columns
Previous Message Andy Fan 2024-12-10 06:45:50 Re: testing framework for MVCC & vacuum (freeze) & heap_page_prune etc.