Re: PostgreSQL, Asynchronous I/O, Buffered I/O and why did fsync-gate not affect Oracle or MySQL?

From: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
To: Pól Ua Laoínecháin <linehanp(at)tcd(dot)ie>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: PostgreSQL, Asynchronous I/O, Buffered I/O and why did fsync-gate not affect Oracle or MySQL?
Date: 2021-05-02 17:00:07
Message-ID: CAM+6J95Lixr+zFA48wtxJvn_xhE9eU-KANkkZ+5r0+MzFGVYtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

This wiki page.
It has PR references for mysql and mongo for the fsycnc issue.

Fsync Errors - PostgreSQL wiki
<https://wiki.postgresql.org/wiki/Fsync_Errors>

I'd leave the more intellectual brainstorming to the experts.

Also, ask for concrete references / reproducible scenarios for opinions if
you care.
else it leads to rumours :)

Thanks,
Vijay

On Sun, 2 May 2021 at 19:17, Pól Ua Laoínecháin <linehanp(at)tcd(dot)ie> wrote:

> Hi all,
>
>
> On 2019/10/09, I posted a question here concerning PostgreSQL I/O, the
> (primarily) Linux fsync problem and my lecturer's attitude to
> PostgreSQL (text of that email is at the bottom of this post).
>
>
> I asked why the fsync issue didn't affect Oracle and/or MySQL for example?
>
> As far as I can see, this was because Oracle uses Direct I/O whereas
> PostgreSQL uses Buffered I/O. I know that the issue has been resolved
> and no longer affects currently supported versions - but I'm still
> curious...
>
>
> From here:
> https://www.percona.com/blog/2019/02/22/postgresql-fsync-failure-fixed-minor-versions-released-feb-14-2019/
>
> > Whereas, writing the modified/dirty buffers to datafiles from shared
> buffers is always through Buffered IO.
>
>
> Now, I'm not quite sure that I completely comprehend matters: Is there
> a difference between Asynchronous I/O and Buffered I/O?
>
> If so, could some kind person point me to referernces that explain the
> difference?
>
>
> But, my foggy brain aside, I read (can't find URL - paraphrasing):
> PostgreSQL is happy to let the kernel take the load off the server and
> look after I/O - that's fine, but you'd better be able to trust your
> kernel.
>
>
> However, MySQL also uses Asynchronous I/O by default - or does it?
>
> From here:
> https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-diskio.html#:~:text=InnoDB%20uses%20the%20asynchronous%20I,which%20is%20enabled%20by%20default
> ,
> we have:
>
> > InnoDB uses the asynchronous I/O subsystem (native AIO) on Linux to
> perform read-ahead and write requests for data file pages. This behavior is
> controlled by the innodb_use_native_aio configuration option, which is
> enabled by default.
>
>
> Now, I haven't (AFAIK) seen references to problems with this fsync
> issue on MySQL. Maybe they're so used to losing data, nobody noticed?
> :-)
>
>
> Seriously though, it does beg the question - why did this cause a
> major issue for PostgreSQL but not for MySQL?
>
>
> Is it because of (a) difference(s) between Asynchronous I/O and
> Buffered I/O asked about above?
>
>
> A couple of pointers (excuse the pun!) about this issue would be
> great. Some stuff which shows the difference between Direct I/O and
> the others would also be helpful. I seem to remember there was mention
> of this for PostgreSQL but that it would be a mulit-year project. Why
> so if PostgreSQL can already use Direct I/O for the WAL logs?
>
> Obviously, I can search and I have been - but I'd appreciate material
> from people here who can sort the wheat from the chaff and point me to
> solid references. If here is not a suitable forum, then kindly
> redirect me.
>
> TIA and rgs,
>
>
> Pól Ua...
>
>
>
> =============================
>
> > 2019/10/09 Is my lecturer wrong about PostgreSQL? I think he is!
>
>
> I recently started a Masters in Computer Science (and not at the
> institution in my email address).
>
> One of my courses is "Advanced Databases" - yummy I thought - it's not
> even compulsory for me but I just *_had_* to take this module. The
> lecturer is a bit of an Oracle fan-boy (ACE director no less...
> hmmm...) and I want(ed) - becoming less enthusiasic by the minute - to
> do my dissertation with him. So, we're having a chat and I make plain
> my love of good 'ol PostgreSQL as my RDBMS of choice and he tells me
> that there are problems with random block corruption with PostgreSQL.
> I said "really" and before that conversation could go any further,
> another student came over and asked a question.
>
> So, I toddled off and did some research - I had heard something about
> this before (vague fuzzy memories) of a problem with the Linux kernel
> so I searched for a bit and duly dug up a couple of pages
>
> https://lwn.net/Articles/752063/ : PostgreSQL's fsync() surprise - and
>
> https://news.ycombinator.com/item?id=19238121 : Linux Fsync Issue for
> Buffered IO and Its Preliminary Fix for PostgreSQL
>
> So, this week I go back to my lecturer and say, yep, there was some
> issue but it was a Linux kernel problem and not PostgreSQL's fault and
> has been resolved.
>
> He tells me that he knew about that but that there was another issue
> (he had "spoken to people" at meetings!). I said "well, why isn't it
> fixed?" and he replied "where's the impetus?" to which I responded
> (quite shocked at this stage) something like "well, I know that the
> core team values correctness very highly" to which he came back with
> "yes, but they have no commercial imperative to fix anything - they
> have to wait until somebody is capable enough and interested enough to
> do the work". He then muttered something about this mysterious flaw
> having been fixed in EnterpriseDB.
>
> At this point, I lost interest. Having lurked on lists and going by my
> general "gut feeling" - if there was a serious issue causing
> irrecoverable block corruption, I'm pretty sure that it would be "all
> hands on deck" until this problem had been solved and "nice-to-haves"
> (GENERATED AS... for example) would have been parked till then.
>
> Now, I have four questions:
>
> 1) Is my lecturer full of it or does he really have a point?
>
> 2) The actual concrete acknowledged problem with fsync that affected
> PostgreSQL - why didn't it affect Oracle? Or MySQL? Or did it but it
> was so rare that it never became apparent - it wasn't that obvious
> with PostgreSQL either - one of those rare and intermittent problems?
>
> 3) Were there ever any problems with BSD?
>
> 4) What is the OS of choice for *_serious_* PostgreSQL installations?
>
> I hope that I have been clear, but should anyone require any
> clarification, please don't hesitate to ask me.
>
> Tia and rgs,
>
> Pól...
>
>
>

--
Thanks,
Vijay
Mumbai, India

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron 2021-05-02 18:31:01 Re: PostgreSQL, Asynchronous I/O, Buffered I/O and why did fsync-gate not affect Oracle or MySQL?
Previous Message Pól Ua Laoínecháin 2021-05-02 13:49:44 Is this the future of I/O for the RDBMS?