From: | "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com> |
---|---|
To: | 'Thomas Munro' <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | RE: [Patch] Optimize dropping of relation buffers using dlist |
Date: | 2020-10-23 00:56:35 |
Message-ID: | TYAPR01MB2990C1DBBC8985E27CC28ADBFE1A0@TYAPR01MB2990.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> > I'm probably being silly, but can't we avoid the problem by using fstat()
> instead of lseek(SEEK_END)? Would they return the same value from the
> i-node?
>
> Amazingly, st_size can disagree with SEEK_END when using the Linux NFS
> client, but its behaviour is worse. Here's a sequence from a Linux
> NFS client talking to a Linux NFS server with no free space. This
> time, I also replaced the fsync() with sleep(60), just to make it
> clear that SEEK_END offset can move at any time due to asynchronous
> activity in kernel threads:
Thank you for experimenting. That's surely amazing. So, it makes sense for commercial DBMSs and MySQL to preallocate data files... (But IIRC, MySQL has provided an option to allocate a file per table like Postgres relatively recently.)
FWIW, it seems safe to use the nodelalloc mount option with ext4 to disable delayed allocation, while xfs doesn't have such an option.
> > Or, can't we just try to do BufTableLookup() one block after what
> smgrnblocks() returns?
>
> Unfortunately the problem isn't limited to one block.
You're right. The data file can be extended by multiple blocks between disk writes.
Regards
Takayuki Tsunakawa
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2020-10-23 01:05:28 | heapam and bottom-up garbage collection, keeping version chains short (Was: Deleting older versions in unique indexes to avoid page splits) |
Previous Message | Ian Lawrence Barwick | 2020-10-23 00:53:29 | proposal: function pg_setting_value_split() to parse shared_preload_libraries etc. |