Quick Links

RE: [Patch] Optimize dropping of relation buffers using dlist

From:	"tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To:	'Thomas Munro' <thomas(dot)munro(at)gmail(dot)com>
Cc:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	RE: [Patch] Optimize dropping of relation buffers using dlist
Date:	2020-10-23 00:56:35
Message-ID:	TYAPR01MB2990C1DBBC8985E27CC28ADBFE1A0@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> > I'm probably being silly, but can't we avoid the problem by using fstat()
> instead of lseek(SEEK_END)? Would they return the same value from the
> i-node?
>
> Amazingly, st_size can disagree with SEEK_END when using the Linux NFS
> client, but its behaviour is worse. Here's a sequence from a Linux
> NFS client talking to a Linux NFS server with no free space. This
> time, I also replaced the fsync() with sleep(60), just to make it
> clear that SEEK_END offset can move at any time due to asynchronous
> activity in kernel threads:

Thank you for experimenting. That's surely amazing. So, it makes sense for commercial DBMSs and MySQL to preallocate data files... (But IIRC, MySQL has provided an option to allocate a file per table like Postgres relatively recently.)

FWIW, it seems safe to use the nodelalloc mount option with ext4 to disable delayed allocation, while xfs doesn't have such an option.

> > Or, can't we just try to do BufTableLookup() one block after what
> smgrnblocks() returns?
>
> Unfortunately the problem isn't limited to one block.

You're right. The data file can be extended by multiple blocks between disk writes.

Regards
Takayuki Tsunakawa

In response to

Re: [Patch] Optimize dropping of relation buffers using dlist at 2020-10-22 21:45:05 from Thomas Munro

Responses

RE: [Patch] Optimize dropping of relation buffers using dlist at 2020-10-28 12:52:08 from k.jamison@fujitsu.com

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2020-10-23 01:05:28	heapam and bottom-up garbage collection, keeping version chains short (Was: Deleting older versions in unique indexes to avoid page splits)
Previous Message	Ian Lawrence Barwick	2020-10-23 00:53:29	proposal: function pg_setting_value_split() to parse shared_preload_libraries etc.