From: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | io_uring support |
Date: | 2019-08-19 18:20:46 |
Message-ID: | CA+q6zcU9oa96K8qL26qTGnygzLmBrX+ZXwBs_HP2TR5h_wnBDg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
For already some time I'm following the new linux IO interface "io_uring", that
was introduced relatively recently [1]. Short description says:
Shared application/kernel submission and completion ring pairs, for
supporting fast/efficient IO.
For us the important part is probably that it's an asynchronious IO, that can
work not only with O_DIRECT, but with also with buffered access. Plus there are
claims that it's pretty efficient (efficiency was one of the design goals [2]).
The interface consists of submit/complete queues and data structures, shared
between an application and the kernel. To facilitate application development
there is also a nice library to utilize io_uring from the user space [3].
Since I haven't found that many discussions in the hackers archives about async
IO, and out of curiosity decided to prepare an experimental patch to see how
this would looks like to use io_uring in PostgreSQL. I've tested this patch so
far only inside a qemu vm on the latest io_uring branch from linux-block tree.
The result is relatively simple, and introduces new interface smgrqueueread,
smgrsubmitread and smgrwaitread to queue any read we want, then submit a queue
to a kernel and then wait for a result. The simplest example of how this
interface could be used I found in pg_prewarm for buffers prefetching.
As a result of this experiment I have few questions, open points and requests
for the community experience:
* I guess the proper implementation to use async IO is a big deal, but could
bring also significant performance advantages. Is there any (nearest) future
for such kind of async IO in PostgreSQL? Buffer prefetching is a simplest
example, but taking into account that io_uring supports ordering, barriers
and linked events, there are probably more use cases when it could be useful.
* Assuming that the answer for previous question is positive, there could be
different strategies how to use io_uring. So far I see different
opportunities for waiting. Let's say we have prepared a batch of async IO
operations and submitted it. Then we can e.g.
-> just wait for a batch to be finished
-> wait (in the same syscall as submitting) for previously submitted batches,
then start submitting again, and at the end wait for the leftovers
-> peek if there are any events completed, and get only those without waiting
for the whole batch (in this case it's necessary to make sure submission
queue is not overflowed)
So it's open what and when to use.
* Does it makes sense to use io_uring for smgrprefetch? Originally I've added
io_uring parts into FilePrefetch also (in the form of preparing and submiting
just one buffer), but not sure if this API is suitable.
* How may look like a data structure, that can describe IO from PostgreSQL
perspective? With io_uring we need to somehow identify IO operations that
were completed. For now I'm just using a buffer number. Btw, this
experimental patch has many limitations, e.g. only one ring is used for
everything, which is of course far from ideal and makes identification even
more important.
* There are few more freedom dimensions, that io_uring introduces - how many
rings to use, how many events per ring (which is going to be n for sqe and
2*n for cqe), how many IO operations per event to do (similar to
preadv/pwritev we can provide a vector), what would be the balance between
submit and complete queues. I guess it will require a lot of benchmarking to
find a good values for these.
[1]: https://github.com/torvalds/linux/commit/38e7571c07be01f9f19b355a9306a4e3d5cb0f5b
[2]: http://kernel.dk/io_uring.pdf
[3]: http://git.kernel.dk/cgit/liburing/
Attachment | Content-Type | Size |
---|---|---|
v1-0001-io-uring.patch | application/octet-stream | 16.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-08-19 18:23:39 | Re: Unused header file inclusion |
Previous Message | Melanie Plageman | 2019-08-19 17:23:19 | Re: Cleanup isolation specs from unused steps |