From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Paul Guo <paulguo(at)gmail(dot)com> |
Cc: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Paul Guo <guopa(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Brown <michael(dot)brown(at)discourse(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fdatasync performance problem with large number of DB files |
Date: | 2021-03-18 11:05:11 |
Message-ID: | CA+hUKGJpKUMRqurMCkf+zy1WrH9WMZTWiMPu-JOmpsbsT9UhFQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 17, 2021 at 11:42 PM Paul Guo <paulguo(at)gmail(dot)com> wrote:
> I just quickly reviewed the patch (the code part). It looks good. Only
> one concern
> or question is do_syncfs() for symlink opened fd for syncfs() - I'm
> not 100% sure.
Alright, let me try to prove that it works the way we want with an experiment.
I'll make a directory with a file in it, and create a symlink to it in
another filesystem:
tmunro(at)x1:~/junk$ mkdir my_wal_dir
tmunro(at)x1:~/junk$ touch my_wal_dir/foo
tmunro(at)x1:~/junk$ ln -s /home/tmunro/junk/my_wal_dir /dev/shm/my_wal_dir_symlink
tmunro(at)x1:~/junk$ ls /dev/shm/my_wal_dir_symlink/
foo
Now I'll write a program that repeatedly dirties the first block of
foo, and calls syncfs() on the containing directory that it opened
using the symlink:
tmunro(at)x1:~/junk$ cat test.c
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main()
{
int symlink_fd, file_fd;
symlink_fd = open("/dev/shm/my_wal_dir_symlink", O_RDONLY);
if (symlink_fd < 0) {
perror("open1");
return EXIT_FAILURE;
}
file_fd = open("/home/tmunro/junk/my_wal_dir/foo", O_RDWR);
if (file_fd < 0) {
perror("open2");
return EXIT_FAILURE;
}
for (int i = 0; i < 4; ++i) {
if (pwrite(file_fd, "hello world", 10, 0) != 10) {
perror("pwrite");
return EXIT_FAILURE;
}
if (syncfs(symlink_fd) < 0) {
perror("syncfs");
return EXIT_FAILURE;
}
sleep(1);
}
return EXIT_SUCCESS;
}
tmunro(at)x1:~/junk$ cc test.c
tmunro(at)x1:~/junk$ ./a.out
While that's running, to prove that it does what we want it to do,
I'll first find out where foo lives on the disk:
tmunro(at)x1:~/junk$ /sbin/xfs_bmap my_wal_dir/foo
my_wal_dir/foo:
0: [0..7]: 242968520..242968527
Now I'll trace the writes going to block 242968520, and start the program again:
tmunro(at)x1:~/junk$ sudo btrace /dev/nvme0n1p2 | grep 242968520
259,0 4 93 4.157000669 724924 A W 244019144 + 8 <-
(259,2) 242968520
259,0 2 155 5.158446989 718635 A W 244019144 + 8 <-
(259,2) 242968520
259,0 7 23 6.163765728 724924 A W 244019144 + 8 <-
(259,2) 242968520
259,0 7 30 7.169112683 724924 A W 244019144 + 8 <-
(259,2) 242968520
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2021-03-18 11:10:41 | Re: [HACKERS] Custom compression methods |
Previous Message | Rahila Syed | 2021-03-18 10:51:55 | Re: row filtering for logical replication |