From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Shawn Debnath <sdn(at)amazon(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Refactoring the checkpointer's fsync request queue |
Date: | 2019-02-22 22:48:53 |
Message-ID: | 20190222224853.ahvm3bf52zqr7k7h@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2019-02-23 11:42:49 +1300, Thomas Munro wrote:
> On Sat, Feb 23, 2019 at 11:15 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2019-02-22 10:18:57 -0800, Shawn Debnath wrote:
> > > I think using callbacks is the better path forward than having md or
> > > other components issue an invalidate request for each and every segment
> > > which can get quite heavy handed for large databases.
> >
> > I'm not sure I buy this. Unlinking files isn't cheap, involves many disk
> > writes, etc. The cost of an inval request in comparison isn't
> > particularly large. Especially for relation-level (rather than database
> > level) truncation, per-segment invals will likely commonly be faster
> > than the sequential scan.
>
> Well even if you do it with individual segment cancel messages for
> relations, you still need a way to deal with whole-database drops
> (generating the cancels for every segment in every relation in the
> database would be nuts), and that means either exposing some structure
> to the requests, right? So the requests would have { request type,
> callback ID, db, opaque tag }, where request type is SYNC, CANCEL,
> CANCEL_WHOLE_DB, callback ID is used to find the function that
> converts opaque tags to paths, and db is used for handling
> CANCEL_WHOLE_DB requests where you need to scan the whole hash table.
> Right?
I'm ok with using callbacks to allow pruning for things like droping
databases. If we use callbacks, I don't see a need to explicitly include
the db in the request however? The callback can look into the opaque
tag, no? Also, why do we need a separation between request type and
callback? That seems like it'll commonly be entirely redundant?
> > > At the time of smgrinit(), mdinit() would call into sync and register
> > > it's callbacks with an ID. We can use the same value that we are using
> > > for smgr_which to map the callbacks. Each fsync request will then also
> > > accompany this ID which the sync mechanism will use to call handlers for
> > > resolving forget requests or obtaining paths for files.
> >
> > I'm not seeing a need to do this dynamically at runtime. Given that smgr
> > isn't extensible, why don't we just map callbacks (or even just some
> > switch based logic) based on some enum? Doing things at *init time has
> > more potential to go wrong, because say a preload_shared_library does
> > different things in postmaster than normal backends (in EXEC_BACKEND
> > cases).
>
> Yeah I suggested dynamic registration to avoid the problem that eg
> src/backend/storage/sync.c otherwise needs to forward declare
> md_tagtopath(), undofile_tagtopath(), slru_tagtopath(), ..., or maybe
> #include <storage/md.h> etc, which seemed like exactly the sort of
> thing up with which you would not put.
I'm not sure I understand. If we have a few known tag types, what's the
problem with including the headers with knowledge of how to implement
them into sync.c file?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2019-02-22 22:59:04 | Re: Refactoring the checkpointer's fsync request queue |
Previous Message | Thomas Munro | 2019-02-22 22:42:49 | Re: Refactoring the checkpointer's fsync request queue |