RE: pgcon unconference / impact of block size on performance

From: Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: RE: pgcon unconference / impact of block size on performance
Date: 2022-06-07 13:48:09
Message-ID: PR3PR07MB82439880210722B647023C3FF6A59@PR3PR07MB8243.eurprd07.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> The really
> puzzling thing is why is the filesystem so much slower for smaller pages. I mean,
> why would writing 1K be 1/3 of writing 4K?
> Why would a filesystem have such effect?

Ha! I don't care at this point as 1 or 2kB seems too small to handle many real world scenarios ;)

> > b) Another thing that you could also include in testing is that I've spotted a
> couple of times single-threaded fio might could be limiting factor (numjobs=1 by
> default), so I've tried with numjobs=2,group_reporting=1 and got this below
> ouput on ext4 defaults even while dropping caches (echo 3) each loop iteration -
> - something that I cannot explain (ext4 direct I/O caching effect? how's that
> even possible? reproduced several times even with numjobs=1) - the point being
> 206643 1kb IOPS @ ext4 direct-io > 131783 1kB IOPS @ raw, smells like some
> caching effect because for randwrite it does not happen. I've triple-checked with
> iostat -x... it cannot be any internal device cache as with direct I/O that doesn't
> happen:
> >
> > [root(at)x libaio-ext4]# grep -r -e 'write:' -e 'read :' *
> > nvme/randread/128/1k/1.txt: read : io=12108MB, bw=206644KB/s,
> > iops=206643, runt= 60001msec [b]
> > nvme/randread/128/2k/1.txt: read : io=18821MB, bw=321210KB/s,
> > iops=160604, runt= 60001msec [b]
> > nvme/randread/128/4k/1.txt: read : io=36985MB, bw=631208KB/s,
> > iops=157802, runt= 60001msec [b]
> > nvme/randread/128/8k/1.txt: read : io=57364MB, bw=976923KB/s,
> > iops=122115, runt= 60128msec
> > nvme/randwrite/128/1k/1.txt: write: io=1036.2MB, bw=17683KB/s,
> > iops=17683, runt= 60001msec [a, as before]
> > nvme/randwrite/128/2k/1.txt: write: io=2023.2MB, bw=34528KB/s,
> > iops=17263, runt= 60001msec [a, as before]
> > nvme/randwrite/128/4k/1.txt: write: io=16667MB, bw=282977KB/s,
> > iops=70744, runt= 60311msec [reproduced benefit, as per earlier email]
> > nvme/randwrite/128/8k/1.txt: write: io=22997MB, bw=391839KB/s,
> > iops=48979, runt= 60099msec
> >
>
> No idea what might be causing this. BTW so you're not using direct-io to access
> the raw device? Or am I just misreading this?

Both scenarios (raw and fs) have had direct=1 set. I just cannot understand how having direct I/O enabled (which disables caching) achieves better read IOPS on ext4 than on raw device... isn't it contradiction?

-J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2022-06-07 14:00:18 Re: pgcon unconference / impact of block size on performance
Previous Message Euler Taveira 2022-06-07 13:38:04 Re: tablesync copy ignores publication actions