From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com> |
Subject: | Re: Avoiding smgrimmedsync() during nbtree index builds |
Date: | 2022-07-23 09:34:54 |
Message-ID: | d410c275-7181-52ce-63ae-0af229b99a70@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 05/03/2022 00:03, Melanie Plageman wrote:
> On Wed, Mar 2, 2022 at 8:09 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>>
>> Rebased to appease cfbot.
>>
>> I ran these paches under a branch which shows code coverage in cirrus. It
>> looks good to my eyes.
>> https://api.cirrus-ci.com/v1/artifact/task/5212346552418304/coverage/coverage/00-index.html
>
> thanks for doing that and for the rebase! since I made updates, the
> attached version 6 is also rebased.
I'm surprised by all the changes in nbtsort.c to choose between using
the buffer manager or the new API. I would've expected the new API to
abstract that away. Otherwise we need to copy that logic to all the
index AMs.
I'd suggest an API along the lines of:
/*
* Start building a relation in bulk.
*
* If the relation is going to be small, we will use the buffer manager,
* but if it's going to be large, this will skip the buffer manager and
* write the pages directly to disk.
*/
bulk_init(SmgrRelation smgr, BlockNumber estimated_size)
/*
* Extend relation by one page
*/
bulk_extend(SmgrRelation, BlockNumber, Page)
/*
* Finish building the relation
*
* This will fsync() the data to disk, if required.
*/
bulk_finish()
Behind this interface, you can encapsulate the logic to choose whether
to use the buffer manager or not. And I think bulk_extend() could do the
WAL-logging too.
Or you could make the interface look more like the buffer manager:
/* as above */
bulk_init(SmgrRelation smgr, BlockNumber estimated_size)
bulk_finish()
/*
* Get a buffer for writing out a page.
*
* The contents of the buffer are uninitialized. The caller
* must fill it in before releasing it.
*/
BulkBuffer
bulk_getbuf(SmgrRelation smgr, BlockNumber blkno)
/*
* Release buffer. It will be WAL-logged and written out to disk.
* Not necessarily immediately, but at bulk_finish() at latest.
*
* NOTE: There is no way to read back the page after you release
* it, until you finish the build with bulk_finish().
*/
void
bulk_releasebuf(SmgrRelation smgr, BulkBuffer buf)
With this interface, you could also batch multiple pages and WAL-log
them all in one WAL record with log_newpage_range(), for example.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey Borodin | 2022-07-23 09:40:44 | Re: Amcheck verification of GiST and GIN |
Previous Message | Bharath Rupireddy | 2022-07-23 09:28:53 | A proposal for shared memory based backup infrastructure |